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1   Introduction 


CLEOPATRA,  the  "Comprehensive  Language  For  Elegant  Operating 
System  And  Translator  Design",  was  designed  in  197«l  by  Schreiner 
[1,2]  as  a  high-level,  Mock-structurei,  extensible  language 
aimed  primarily  at  operating  systems  impiementors. 

During  1975  a  first  attempt  at  an  implementation  of  the 
language  was  made  in  PL/T  on  an  IRH/360.   This  version 
implemented  only  a  subset  of  CLEOPATRA  and  was  intended  to  be  a 
bootstrap  so  that,  future  CLEOPATRA  compilers  could  be  written  in 
the  language  itself.   The  compiler  consisted  of  two  phases: 
analysis  and  coie  generation.   The  former  performed  lexical, 
syntactic,  and  semantic  analysis  and  produced  a  symbol  table  and 
interme3iate  tert.   The  latter  phase  operated  on  the  intermediate 
text  to  produce  ob-jest  code  in  the  form  of  a  relocatable  binary 
deck  suitable  for  input  into  the  OS/160  loader. 

Unf ortanately,  soon  after  the  completion  of  this  project  the 
analysis  phase  implementation  was  found  to  be  inadequate;   it  was 
bug-ridien  and,  bpcause  of  inferior  coSing,  would  have  required 
substantial  effort  to  correct.   Therefore,  the  decision  was  male 
pot  to  atteupt  repair  of  the  bad  code,  but  to  rewrite  it.   During 
1976  a  second  exempt  was  made;   the  impleientation  of  the  old 
c.o&e   generation  phase  (Halbur  [3, a])  was  considered  to  be  of 
sufficient  quality  to  retain,  at  least  for  the  present,  so  the 
new  analysis  phase  (Fisher  [S])  was  written  to  fit  the  coder's 
sDecifisations. 


The  fanlts  of  the  original  analysis  phase  and  their 
defection  at.  such  a  late  date  resulted  from  meager  and  ad  hoc 
testing  methods  during  its  development.   In  order  to  preclude 
running  amok  in  the  same  fashion  a  second  time,  an  extensive 
testing  effort  ttas  been  made  concurrently  with  the  development  of 
Pisher's  implementation;   a  documentation  of  the  testing  effort 
comprises  th<»  tody  of  this  thesis. 

The  goal  of  the  testing  is  twofold:   first,  to  demonstrate 
that  Pishe^s  implementation  of  the  analysis  phase  is  "correct" 
(or,  more  realistically,  to  document  any  deviations  from  the 
specifications)  so  that  later  CLF0PITR1  compilers  can  be  assured 
of  a  solid  basis;   and  second,  to  produce  a  test  data  package 
which  can  be  permanently  associated  with  the  compiler  and  used 
fir  regression  testing  when  the  currently  implemented  subset  is 
^voanded. 


Chapter  2  discasses  the  difficult  problem  of  testing  a  larqe 
piece  of  software  and  the  basic  philosophies  that  this  testing 
effort  follows.   Subseqnent  chapters  detail  the  design, 
production,  and  execution  of  test  cases  for  each  component  of  the 
compiler's  analysis  phase:   Chapter  3  on  lexical  analysis, 
rhaDter  4  on  syntactic  analysis,  and  Chapter  5  on  semantic 
analysis.   The  final  chapter  summarizes  the  results.   Appendix  K 
qivas  *he  lefini+ion  o^  the  language  actually  accepted  by  this 
i  mDle»mentation  of  the  analvsis  phase,  and  Appendix  B  presents  the 
complete  t=st  data  generated. 


This  thesis  is  intended  as  a  companion  to  the  reports 
■*n*ioned  above.   *  knotflege  of  the  information  contained  in  [1] 
and  [  1 1  is  essential  to  understanding  the  material  discussed 
here.  To    minimize  confusion,  the  terminology  and  notation  used 
in  this  thesis  are,  as  much  as  possible,  consistent  with  the 
earlier  reports. 


2   Testing  Philosophy 

The  intent  to  validate  a  piece  of  software  immediately  gives 
risQ  to  many  gnestions.   Soie  of  them  are: 

-  What  does  validation  mean? 

-  what  testing  methods  are  available? 

-  Which  method  is  to  be  used  and  why? 

-  How  is  the  method  to  be  carried  out? 

Pefore  discussing  the  actual  validation  of  CLEOPaTFa«s  analysis 
phase,  it  would  be  beneficial  to  answer  these  questions. 

2.1   Definition  of  Validation 

Gruenberger  [61  gives  a  very  useful  definition  of 
validation.   Assume  that  a  prograi  has  been  designed  and  encode!, 
and  the  imolementor  has  debugged  the  code,  thus  removing  the 
mechanical  errors  of  coding  and  producing  a  superficially  correct 
program.   The  program  as  written  clearly  solves  some  problem; 
♦■he  purpose  of  validation  is  to  show  that  it  solves  the  desired 
problem. 


Validation  usually  involves  testing  performed  by  an  outside 

agent  playing  the  role  of  the  "Devils  Advocate".   This  testing 
has  a  dual  purpase:   showing  that  all  functions  in  the  program 
poecif i~at ions  are  implemented,  ani  showing  that  all  implementei 
functions  are  soecified.   These  are  what  Elmendorf  [7]  calls 
"specification  testing"  and  "program  testing,"  respectively.   If 
the  *wo  sets  of  functions  are  identical,  the  correct  problem  mas 
been  solved,  snd  the  program  has  been  validated. 


However,  if  the  two  sets  of  functions  are  not  identical,  the 
testing  will  hare  pinpointed  the  aberrant  function(s).   The  cause 
of  the  deviation  could  be  inconsistent,  poorly-stated  or 
erroneous  specifications  or  miscoding  (or  even  a  combination  of 
these) .   The  implementor  must  then  decide  how  best  to  remedy  the 
discrepancy  and  carry  oat  the  correction;  the  testinq  can  then  be 
reoeatpd. 

The  repetition  of  the  entire  bank  of  test  cases  ran  against 
the  software  after  each  repair  of  the  code  is  called  "regression 
tasting"  (Brown  and  Sampson  [8]).   Any  modification  of  the 
program  runs  tha  risk  of  introducing  an  error  which  was  not 
present  before  the  change  was  made  and  which  nay  be  totally 
unrelated  to  the  successful  implementation  of  the  change  itself. 
So  testing  must  be  repeated  to  insure  that  previous  work  has  not 
been  damaged  in  some  obscure  way  by  the  alteration. 

2.2   Available  Testing  Hethods 

There  have  been  many  methods  of  software  testing  discussed 
in  the  literature  during  the  past  few  years.   Some  of  these 
techniques  are  lore  thorough  than  others,  both  in  a  formal  sense 
(Soodenoogh  and  Gerhart  [9"J)  and  in  relation  to  the  concepts  of 
program  and  specification  testing  as  discussed  above.   Some  are 
more  costly  in  human  time  or  computing  resources  or  both. 
Therefore  a  practical  balance  between  thoroughness  and  expense 
■ast  be  found. 


On*  way  to  show  the  correctness  of  a  program  is  by  proof. 
However,  there  Is  rarely  enough  inforeation  available  to  allow  a 
formal  proof  of  correctness.   as  London  M°]  points  out, 
assumptions  wast  be  made  about  the  semantics  of  the  programming 
languacre  used,  the  completeness  of  the  problem  domain  specifi- 
cation, and  the  correctness  of  the  proof  itself.   Going  a  steo 
further,  the  entire  running  enwiroment  of  the.  program  (language, 
operating  systea,  and  hardwire  processors)  must  be  coipletely 
axioaatized  and  proved  consistent  before  the  program  can  be 
formally  implemented  and  proved  within  the  system.   This  is 
clearly  beyond  current  state-of-the-art  technigues,  so  more 
informal  proof  aethods  are  employed.   But  these  less  formal 
proofs  are  more  prone  to  error.   Goodenough  and  Gerhart  show  that 
several  programs  which  have  been  "proven"  in  the  literature 
contain  numerous  bugs.   It  is  interesting  to  note  that  many  of 
these  bugs  are  due  to  imprecisely  stated  specifications  and  would 
have  been  found  if  even  rudimentary  test  cases  had  been  run. 

Another  testing  method,  which  relies  on  brute-force 
computing  power,  is  to  test  all  possible  input  combinations. 
Even  for  a  program  having  a  very  small  number  of  inputs, 
exhausting  input  coabinations  is  absurdly  impractical.   Take,  for 
example,  a  program  with  two  inputs  X  and  T  that  computes  one 
oatput  z  =  F(X,T)  (Huang  f 1 1 1)  •   If  *  and  Y  are  represented  in  12- 
bit  registers,  *here  are  2,**2'*=2**  possible  input  combinations. 
F?en  if  thp  program  itself  took  an  average  of  only  one 
millisecond  per  execution,  it  would  take  more  than  50  billion 
years  to  complete  the  t«»st! 


I'he  next  three  methods  involve  siailar  approaches.   The 
first  method  is  to  generate  test  cases  that  exercise  every 
statement  of  the  program.   This  test,  is  practical,  since  choosing 
the  test  lata  is  a  fairly  straicrht-f  or  ward,  mechanical  process. 
However,  there  ire  several  types  of  errors  that  are  not 
necessarily  discovered  using  this  technigue.   Erroneous  transfer 
of  control  nay  not  be  found  since  it  is  possible  to  execute  every 
statement  in  a  program  and  not  traverse  all  (possibly  faulty) 
control  paths.   Also,  missing  control  paths  (i.e.,  failure  to 
examine  a  special  easel  probably  will  not  be  discoverer!  since  the 
choice  of  test  lata  is  based  on  the  program  coie  and  not  on  the 
program  specification. 

The  second  method,  which  is  much  more  likely  to  detect 
errors,  is  to  reguire  that  each  and  every  control  path  be 
execute!  at  least  once.   This  method  guarantees  that  erroneous 
transfers  of  control  will  be  found,  and  also  detects  subtle 
errors  which  are  continaent  on  the  exact  sequence  of  previous 
events  in  the  execution.   However,  the  method  is  impractical  due 
to  loop  constructs.  h    program  with  a  loop  has  at  least  as  many 
different  control  paths  as  the  number  of  times  the  loop  can  be 
iterated,  and  for  programs  with  several  loops  the  total  number  of 
paths  is  the  product  of  the  paths  through  each  individual  loop, 
obviously,  this  can  lead  to  a  prohibitively  large  number  of  paths 
in  manv  cases. 

A   more  practical  approach,  a  compromise  between  the  previous 
*wo,  is  to  generate  test  data  that  exercises  all  program 
statements  anl  all  branches.   This  is  equivalent  to  reguirincj 


that  each  edge  of  the  flowchart  corresponding  to  the  program  be 
traversed  at  least  once  (ffnang  [11]).   This  scheme  is  reasonably 
effective,  bat  still  does  not  insure  that  all  errors  will  be 
defected;   those  such  as  missing  control  paths,  incorrect  path 
selection,  and  an  incorrect  or  missing  action  aay  not  be  found 
dua  to  an  (an) fDrtanate  choice  of  test  data. 

•^he  last  technigue  to  be  discassed  has  a  coapletely 
different  emphasis.   Instead  of  basing  tests  on  the  program 
itself,  «-his  npthod  generates  test  cases  directly  froa  the 
specifications,  using  prograa  structure  only  to  complement  the 
specification  information.   Goodenough  and  Gerhart  describe  ho* 
decision  tables  aay  be  employed  to  facilitate  identification  of 
"all  conditions  relevant  to  a  program's  correct  operation,"  froi 
which  test  cases  are  generated  that  exercise  all  possible 
combinations  of  these  conditions.   This  is  an  effective 
technique,  but  it  involves  an  element  of  insight  on  the  part  of 
the  tester  to  correctly  determine  the  "relevant  conditions"  froi 
♦-he  specifications  and  program. 

2.3   Selecting  a  Method 

Clearlv,  some  of  the  technigues  described  above  are  more 
saited  than  others  to  a  product  such  as  verifying  a  compiler  . 
The  criteria  used  in  rating  these  methods  include  practicality, 
effectiveness,  and  expense. 

attempting"  to  formally  prove  a  piece  of  software  as 
substantial  as  =*  compiler  is  an  exercise  in  complexity  and 
frustration.   The  applicable  technigues  are  extremely  tedious  aid 


ranidlv  becom3  unmanageable.   Us©  of  lass  formal  techniques  l«>i1s 
*o  "proofs"  which  are  much  less  conclusive  than  one  would  desir*. 
'"astira  hy  exhausting  input  combinations  is  similarly 
i  DDrart-ica  1:   +  he  r-omplexi«-y  of  the  input  domain  of  the  CLEOPATRA 
compiler  results  in  sets  of  test  data  which  are  intolerably 
large.   Random  samplincr  is  feasible,  but  it  is  verv  difficult  to 
oblectively  measure  the  degree  of  success  of  this  method  since* 
th»  test  process  is  entirely  sub-Jective.   Osing  test  data  that. 
^K^r-is?  all  statements  of  a  program  is  easier  to  °valnatef  but 
is  not  effective  enough  in  exposing  errors.   Exercising  all 
program  paths  is  impractical  due  to  the  sheer  number  of  possible 
paths;   the  problems  in  using  this  method  are  similar  to  those 
encountered  with  exhaustive  input  testing. 

Two  testing  attacks  remain:   exercising  all  proaram 
sfrat»ments  and  branches,  and  exercising  all  combinations  of 
specification  conditions.   The  former  approach  implements  program 
testing,  but  ignores  the  specifications;   the  latter  implements 
specification  testing  but  tests  the  program  only  to  a  lesser 
3egrep.   sirr>  nei^h^r  of  thesa  individually  constitute  a 
sufficient  tost,  th?  loaical  answer  is  to  combine  the  two 
approaches,  obtaining  a  testing  method  that  performs  both  program 
mi  specification  t^stin.   therefore,  the  attack  to  be  taken  in 
th*  validation  of  *he  ^ieoortra  analysis  phase  is  to  first 
generate  f^st  cases  from  the  specifications  anl  from  a  limited 
know! ear  of  the  vroiras'^  structure  so  that  all  combinations  of 
ralevant  conditions  are  »xercised,  and  then  to  verify  that  thesa 
te«;t  r^c;=>=;  * n    indeed  cause  the  "xprution  of  all  program 
statements  anl  branches. 
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2.4   Implementation  of  the  resting 

Whereas  good  program  design  is  performed  top-down,  program 
testing  is  normally  performed  bottom-up.   Hetzel  [12]  advocates 
"segaenfat ionr w  the  concept  of  testing  new  functions  only  after 
posting  all  sub-f unrtions.   The  advantages  of  segmentation  are 
increased  conceptual  clarity,  economy  of  test  cases,  and  the 
ability  to  do  more  testing  in  parallel  with  program  development. 

▼his  nodular  approach  was  used  in  the  validation  effort  to 
be  discussed.   Test  cases  for  sub-f anct ions  were  generated  using 
Goodenough  and  ~erhart»s  decision  table  teshnlgue,  then  the  test 
cases  for  all  the  sub-functions  were  synthesized  to  test  the 
complete  function.   This  last,  unified  test  was  run  on  the  PL/I 
optimizing  compiler,  which  offered  an  option  of  counting 
statements  executed  and  branches  taken,  in  order  to  verify  that 
all  program  statements  and  branches  were  exercised. 

The  major  modules  considered  in  this  validation  were 
lexical,  syntactic  and  semantic  analysis.   Each  of  these  modules 
was  tested  bottom-np,  beginning  with  lexical  analysis,  and  the 
cDrrectness  of  each  successive  module  depended  on  the  correctness 
of  the  preceding  module.  ^he   bulk  of  this  thesis  is  composed  of 
a  derailed  account  of  the  validation  of  each  of  these  major 
modules.   The  soecif ications  used  for  the  testing  were  the 
nrirrinal  language  definition  (Schreiner  [1],  hereafter  called 
Hthe  Reoort") ,    abbreviated  by  Halbnr's  subset  definition  [41,  aad 
modified  somewhat  by  fishers  implementation  considerations  [5], 
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3   Lexical  Analysis 

Because  lexical  analysis,  the  identifying  and  isolating  of 
tokens  froi  an  input  stream,  is  the  foundation  of  any  compiler, 
the  testing  of  this  module  was  very  thorough.   The  aim  was  to 
insure  that  any  legal  token  (and  many  Marginally  illegal  ones) 
wsnld  be  correctly  isolated  from  any  possible  environment.   This 
resulted  in  a  greater  concern  with  how  the  lexical  analysis 
handled  illegal  inout  than  with  how  later  modules  processed 
illegal  input.   I'he  sections  of  this  chapter  describe  the  sub- 
functions  that  »ere  tested,  including  the  character  set  mapping, 
t-he  detection  of  delimiting  boundaries,  and  ths  isolation  of 
tokens  (comments,  identifiers,  operators  and  constants) .   The 
last  section  discusses  the  integration  of  all  sub-function  test 
cases  and  the  tasting  of  the  complete  lexical  analysis  module. 

3. 1   Character  Set  Happing 

The  kernel  of  lexical  analysis  is  the  mapping  of  the  input 
characters  into  the  functional  classes  of  character  types 
(letter,  digit,  delimiter,  special  and  control).   The  specifi- 
cations against  which  ths»  code  was  tested  wers  productions  1.1 
through  1.5  of  the  Report.   The  code  for  this  mapping  function 
was  isolated,  and  all  characters  that  could  be  input  via  a  DFC/10 
terminal  were  presented  to  the  function.   The  results  were  as 
follows.   (See  Appendix  ft  for  an  explantion  of  the  BNP  notation.) 
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(1.1)  letter    ::*   Ik   |    B  |    C   |    D  |    !   |   P   |   B    |    R    |   I    |   J  |    K  | 

L|H|W|OtP|0|R|S|T|0|V|W|X 

|T|Z|a|b|c|a|elf|g|h|i|j| 

M  1  I  ■  I  b  |  o  I  P  I  q  I  r  |  s  |  t  |  b  I  v  I  « 
I  x  1  y  l  * 

(1.2)  deliwiting_character  ::«  (  1  )  I  .  I  :  I  ,  I  •  I  ;  I 

blank_character  f  _ 

(1.3)  special_character  ::=  a  |  *  i  S  |  «  |  &  I  *  I  -  I  ♦  I  = 

I  ■  I  ?  t  /  I  <  I  >  I  I  I  *-'l  - 

(1.4)  digit    ::=0l1)2l3|tt|5l6f7|8|9 

(1.5)  contr9l    ::=    ! 

Productions  1.1  and  1.4  agree  exactly  with  the  specifi- 
cations.  Production  1.2  differs  only  in  that  the  underscore 
character  has  been  added;   this  will  be  discussed  further  with 
identifiers.   The  characters  <  and  >  do  not  appear  in  production 
1.3,  hawing  been  deleted  froi  the  character  set  because  of  the 
inability  to  enter  them  via  input  dewices  available  at  this 
installation.   Sguare  brackets  f  and  "J  have  also  been  deleted 
froi  the  character  set.   The  user  is  cautioned  that  the  character 
t    (which  is  entered  as  a  *  on  a  keypunch,  but  as  a  [  on  a  DEC/13 
terainal)  prints  on  th*  pw  train  printer  as  a  [ ;   this  is  due  to 
the  internal  character  napping  of  the  systea  link  between  the 
DEC/10  and  IR1/360,  and  a  policy  decision  was  aade  to  let  it 
reiain  so.   Production  1.5  differs  froa  the  specifications  in 
that  the  backspace  character  was  deleted,  again  because  of 
linkage  probleis,  and  the  end_of_source_record  "character"  is  nDt 
■entioned  here  since  it  is  wore  a  state  of  the  lexical  analyzer 
and  is  therefore  not  giwen  a  capping. 


1.1 


3.2   Delimiting  Boundaries 

The  delimiting  boundaries  (as  described  in  section  1.1  of 
fh?  Report)  between  the  characters  of  the  fire  character  classes 
are  defined  by  the  following  rule:   there  is  an  understood 
delimiting  bomndary  between  any  two  characters,  except  that  there 
is  no  delimiting  boundary  between 

-  two  letters 

-  a  letter  and  a  digit 

-  a  digit  and  a  letter 

-  two  digits 

-  two  special  characters. 

In  order  to  verify  that  this  role  holds,  a  decision  matrix 
was  constructed  representing  the  cross  product  of  the  set  of  five 
character  types  with  itself.   That  is,  an  entry  in  row  i  and 
column  1  of  the  matrix  indicates  that  a  character  of  type  1 
immediately  follows  a  character  of  type  i.   Test  data  were 
generated  so  that  there  was  a  boundary  corresponding  to  each 
position  in  the  matrix.   Figure  1  shows  the  test  data  along  with 
th°  matrix,  where  each  position  is  filled  with  the  character  pair 
satisfying  that  boundary. 

The  results  of  the  test,  in  the  form  of  tokens  identified, 
are  listed  in  Figure  2.   These  results  agree  with  the  boundary 
rule  except  in  the  case  of  a  digit  followed  by  a  letter:   if  the 
digit  for  diqit  string)  is  immediately  preceded  by  an  alphabetic 
character,  there  is  no  boundary  as  the  rule  predicts;   however, 
if  the  digit  string  is  immediately  preceded  by  a  non-alphabetic 
character,  then  there  is  a  deliaitinq  boundary  between  the  last 
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ielii 
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control 


letter  digit   delle  special  control 
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I  ( 

|<G5E 

I  ! 
|F«.,T 


— i 
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L| 

9| 

)  I 


fignre  1 
Delialter  latrii  and  Test  Date 


digit  of  the  string  and  the  following  letter.   This  rather 
confosin*  eiceptlon  Is  lllastrated  fee  teo  etaeples  froa  the  test 
data:   in  the  sequence  •ftn?:8J8*  there  J»  a  bounder?  between  the 
•1»  and  the  »R»  since  the  digit  string  '93'    is  preceded  by  a 
•:•;   on  the  other  hand,  in  the  sequence  '<G5E'    there  is  |io 
boundary  between  the  «5»  and  the  •!•  since  the  digit  string  l5< 
is  preceded  by  a  •G*. 


This  ainor  deviation  froa  the  specifications  was  not 
corrected  since  it  would  hawe  involved  a  fairly  nalor  redesign  of 
the  scanner.   As  will  be  seen  later,  this  discrepancy,  however 
siall  it  alqht  seea,  causes  adverse  side-effects  in  the 
processlnq  of  nuneric  constants  that  contain  Illegal  characters. 


15 


card  #1:  AB7  :     83    R     ;     9$ 

card  #?:  *  L 

sard  #3:  (  9 

card  #U:  <  G5E   ) 

card  #5:  H  =     .     .     T 


Figure  2 
Tokens  Identified  from  Delimiter  Test 


3.  3   Conment  Tokens 

Comments  are  specified  in  section  1.5  of  the  Report  as: 

(1.6)  comient  ::=  COHHFMT  any  sequence  of  characters  with  the 

exception  of  a  semicolon  ; 

(1.7)  ::=  !  any  sequence  of  characters  with  the 
exception  of  an  end  of  the  source  record 
end_of_source_record 

The  only  change  in  Schreiner's  specifications  is  that  a  comment 
is  not  squiwalent  to  a  blank  character,  as  stated  in  the  Report; 
it-  does  act  as  a  general  delimiter  where  it  appears,  but  it  nay 
not  occur  wherever  a  blank  character  is  legal.   In  particular,  a 
comment  is  not  allowed  within  a  constant  (as  described  in  section 
3.5)  though  blanks  are  allowed. 

A  reasonable  test  case  for  comments  is  to  construct  a 
comment  of  each  type  containing  every  possible  character 
(including  an  undefined  character,  but  of  course  no  semicolon  in 
the  first-  type  of  comment),  along  with  the  string  •  COHHERr* . 
This  insures  that  neither  single  characters  nor  "embedded 
comments"  interfere  with  the  isolating  of  the  comments.   Figure  3 
shows  the  test  cases  executed.   (Notice  that  the  string  •COflnRMI1 
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must  be  surrounded  by  delimiting  boundaries.)   The  result  was 
thai-  the  tokens  1,  2,  1  and  •  were  correctly  isolated,  and  the 
comments  were  properly  ianored. 


| 1C0HHENT    0123U56789    ABCDEPGH IJKLNNOPOFSTUVBXYZ    !().:», _»*$*| 6*-»  =  "<>?/[  ]C0MHENT 
| abcdefghi jklinopqrstuvwxyz    ;    2  ICOHHBMT    abcdef ghi  jlcLinopqrstavwxyz 

I  3UBCDEFGHIJKLHHOPQFSTUVWIYZ    0123«l56789    !S#JX|&*()     ♦-«][  "•:;<>?,.  / 


Figure  3 
Test  rases  for  Comment  Tokens 


3.U   Identifier  and  Operator  Tokens 


The  main  specifications  for  identifier  and  operator  tokens 
are  aqain  taken  from  the  Report  (section  2.1): 

(2.1)  identifier  ::=  letter  f  letter  1  digit  ]• 

(2.2)  operator  ::=  f  special_character  }•  I  identifier 
There  are  only  two  exceptions  to  the  discussion  in  the  Report. 
First,  the  transparent  underscore  is  not  implemented;   an 
underscore  appearing  within  an  identifier  acts  as  a  delimiter, 
thus  dividing  the  identifier.   (This  is  the  reason  the  underscore 
was  added  to  the  character  set  as  a  delimiting  character.) 
Second,  there  is  not  explicit  maximum  length  on  either 
identifiers  or  operators;   they  are  limited  in  length  only  by  the 
restriction  ♦•ha*-  th»»y  cannot  be  broken  across  source  record 
boundaries.   Therefore,  in  this  implementation  these  tokens  may 
be  up  to  80  characters  long,  if  positioned  properly. 
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Proa  these  specifications  th«  decision  tables  in  Figure  4 
were  produced,  enumerating  cases  to  be  tested  for  identifiers  and 
operators.   Pignre  5  shows  actual  test  cases  satisfying  the 
criteria  of  the  decision  tables  and  the  results  after  execution. 
All  tokens  were  isolated  correctly,  verifying  the  correctness  of 
trhp  identifier/operator  handling  portion  of  the  lexical  analysis. 

Identifiers 
legal  cases:  12345678 

ends  with  letter  or  digit  |  LLLLDDDD| 

I  I 

contains  interior   letters   J    NYHYNYWYI 

1  I 

contains  interior  digits      |    NHYYNHYYI 

i i 

Pathological  Cases:  starts  with  a  digit 

broken  across  card  boundary 
contains  a  blank 
contains  an  underscore 
contains  other  illegal  character 

Operators 
legal  cases:    1  2 


r 1 

length>1      1  Y  w  | 
i 1 


Pathological  Cases:  broken  across  card  boundary 

contains  a  blank 
contains  other  illegal  character 


Pignre  a 
Decision  Tables  for  Identifiers  And  Operators 
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II   D0918273645D  X9BCD8BFG7HI J6KLH5WOP0QFS3TUV2WITZ1 0  ZTXHVOTSHQPOHHLKJIHGPEDCBA  I 
I XONBERS   OHDBR_SCORE   A BCD0EMGH2IJ3KLttHH5OP6QR7ST8 0V9WXTZ   G0123456789   K9     *| 
|iillB»-*»"?/<>|f-»  $♦->:<*/$    7BBONG   A*TI"ES   BIGHTS   SEHI;COL3I 
I   naBbers      IDEMTifier 

L 


Figure  5 
Identifier  and  Operator  Test  Cases 


3.5   Numeric  Constants 

*he  specifications  for  constants,  as  are  the  specifications 
for  the  remainder  of  this  testing,  are  based  on  Halbar  [ * ]  with 
only  minor  modifications.   Halbur's  productions  2.1  through  2.11 
are  correct,  bat  2.15  and  2.17  have  been  changed  to: 
(2.15)     long_integer  ::=  P.  decimal_string 
(2.17)     bit  ::=  s.  binary_digit 

This  change  in  the  BRF  reflects  the  restriction  that  only  one  of 
a  base  or  a  size  can  be  specified  in  one  constant  (e.g.,  P.X.7PPF 
is  not  legal,  even  though  X.7FPF  and  F. 32767  are  legal).   The 
only  other  modification  is  that  one  or  more  blanks  are  allowed 
between  the  letter-dot  preface  and  the  minus  symbol,  if  it 
appears,  or  between  the  preface  and  the  digit  string  if  there  is 
no  minus  present. 

The  decision  table  given  in  Fionre  6  was  constructed  from 
the  specification  productions,  and  the  test  cases  shown  in  Pigure 
7  were  generated  to  satisfy  the  decision  table  conditions.   The 
results  »ere  not  completely  as  expected. 


Th3  lajor  problem  hinged  on  constants  that  exceeded  the 
specification  bounds  for  minimum  and  maximum  values  (see  Figure 
R) .   First,  no  provision  had  been  made  in  the  code  to  detect 
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Figure  7 
Test  Data  for  Constants 


constants  whose  values  were  out-of-bounds.   Therefore,  the  first 
tine  an  extremely  large  constant  was  encountered,  the 
FIXEDPVPPFLOR  condition  occurred  and  aborted  the  execution  of 
lexical  analysis.   Clearly,  this  drastic  an  error  "message"  was 
undesirable,  so  an  ON  PIXEDOVERPLOR  unit,  which  printed  a 
reasonable  error  message  and  set  the  constant's  value  to  zero, 
was  incorporated  into  the  procedure  that  calculates  constant 
values,  and  the  test  was  rerun.   Onf ortunately,  this  repair  di3 
not  coioletely  correct  the  problem.   This  time  constants  of 
extremely  large  magnitude  (>2147U836U8)  were  correctly  detected, 
but  deciaal  constants  whose  values  were  only  moderately  out-of- 
bounds  (32767<|x|<2107U836U8)  were  not  flagged,  and  both  decimal 
and  long_integer  constants  in  this  range  were  given  spurious 
values.   The  cause  of  this  turned  out  to  be  the  mixing  of  FIXED 
PINRPY  15  and  31  variables  to  hold  the  constant's  computed  value, 
causing  the  loss  of  high-order  significant  bits.   The  problem  was 
finally  solved  by  uniformly  using  FIXED  BINAPT  31  variables  to 
hold  the  computed  value  and  checking  normal  decimal  constants  to 
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insure  that  their  values  were  within  the  correct  range. 

Second,  the  value  P. -21474B3648  was  flagged  as  being  out  of 
bounds.   The  reison  for  this  was  that,  by  definition,  the  value 
of  a  negative  constant  is  -1  times  the  positive  value  computed. 
Therefore,  when  the  positive  value  2147*83648  was  coiputed,  it 
was  correctly  flagged  as  out-of-range  before  the  value  could  be 
negated.   This  indicates  that  the  specifications  were 
inconsistent;   the  range  of  long_integer  constants  should  be 
I lonq_integer|  <    2147483648,  and  not  as  shown  in  Pigure  8. 

H&XIHOH  HIBIHOH 

DECIMAL  32767  -32768 

HEXlDE^THiL        X.7PPP  X.-8000 

r,ONG_IHTEGER  P.  2147483647  P. -2147483648 
BIT                  S.I  S.O 


Pigure  8 
Range  of  Constant  Values 


K    third  problem  that  this  test  data  revealed  was  that 
constants  with  an  w.  preface  which  contained  any  letters  a 
through  P  were  evaluated  interpreting  these  letters  as 
hexadecimal  digits.   The  cause  of  this  problem  was  a  local 
character  set  nipping  for  dioits  that  distinguished  between 
binary,  octal  and  other  digits,  but  did  not  distinguish  between 
decimal  and  hexadecimal  digits.   The  lapping  was  modified,  and  i 
rerun  of  the  test  data  showed  that  the  problem  was  solved. 
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The  fourth  inconsistency  involves  the  processing  of  those 
constants  containing  illegal  blanks  following  the  linos  sign. 
When  a  negative  decimal  constant  contains  blanks  separating  the 
minns  sign  and  the  digit  string,  the  string  is  interpreted  as  in 
operator  followed  by  a  positive  decimal  constant,  which  is 
correct  accordinq  to  the  specifications.   If  a  hexadecimal  or 
binarv  constant  contains  one  or  wore  blanks  following  the  minus, 
the  entire  token  is  correctly  isolated  and  flagged  as  being  in 
error,  which  again  is  what  the  specifications  stipulate. 
Powever,  if  a  long_integer  constant  contains  illegal  blanks  after 
the  minus,  the  P.-  is  tokenized  and  given  a  value  of  zero;   thi3 
farces  the  string  that  would  have  been  the  body  of  the  constant 
to  be  analyzed  on  its  own  individual  characteristics  (being  typad 
as  either  a  decimal  string  or  identifier  or  decimal  string 
followed  by  an  identifier)  giving  unexpected  and  unwanted 
results.   Similarly,  if  an  S.  is  followed  by  any  character  other 
than  a  binary  digit  or  a  string  of  blanks  followed  by  a  binary 
dicrit,  the  S.  is  tokenized  and  given  a  value  of  FaLSE,  and  the 
rest  of  the  string  is  interpreted  on  its  own.   The  processing  of 
long_integer  constants  was  changed  so  that  it  was  consistent  with 
hexadecimal  and  binary  constants  (i.e.,  the  entire  token  was 
isolated  and  flaoged  as  illegal) ,  but  it  was  decided  to  let  the 
bit  constant  handling  remain  as  it  was. 

The  last  problem  ancovered  by  this  test  data  was  the 
handling  of  those  numeric  constants  containing  illegal 
characters,  such  as  X.-5Q31  or  57B3.   In  cases  like  these,  the 
constant  is  interpreted  as  containing  all  the  characters  upto  bat 
rot  including  the  illegal  character;   then  everything  from  that 
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character  on  is  interpreted  on  its  own.   As  a  result,  these 
constant  are  interpreter!  as  a  legal  constant  followed  by  another 
token  (or  tokens) ,  and  not  as  a  single  illegal  constant  as  one 
would  expect,  since  it  is  in  general  aore  probable  that  the  illa- 
qitiaate  letter  was  a  typographical  error  (as  in  using  the  letter 
o  instead  of  the  digit  zero)  or  resulted  in  using  a  digit  of  an 
incorrect  base.   The  solution  to  this  problea  mainly  consists  of 
repairing  the  digit/letter  boundary  problew  discussed  in  section 
3.2,  which,  as  stated  before,  was  not  done. 

3.6   Character  Literals 

Character  constants  are  defined  by: 
(2.28)     literal_value  ::=  C.  any  seguence  of  characters  up  to 

but  not  including  the  first  following 

blank_character 
That  is,  the  first  character  following  the  C.  preface  begins  the 
li*-eral_value,  and  the  first  blanx_character  following  the  C. 
ends  the  literal_value.   Additionally, 

End_of_source_record  is  transparent  to  a  literal_value 

and  dses  not  becoae  a  part  of  it. 

The  underscore  character  represents  a  blank_character 

in  the  literal_walue. 

Neither  a  !  nor  the  string  COHHBNT  initiate  conents; 

they  are  considered  part  of  the  literal_walue. 

K   literal_value  aay  not  exceed  256  characters;   longer 

literals  are  truncated  to  256. 
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Picjure  9  shows  the  literal_walue  test  cases.   Contained  in 
at  least  one  of  the  literals  are  !  and  COHHPWT,  each  character  of 
♦■he  character  s*t,  an  andefined  character,  an  underline,  and  an 
end_of_source_r?corfl  (bat  no  blank_character,  of  course).   The 
data  also  include  a  literal  longer  than  256  characters  and  one  of 
length  zero. 


|C.  ABCDEFGBIJKLBNOPQRSTUVIXYZ_()  .  :  ;,  •  i#  $*&*-  +  /="-.><  |  ?  0 123456789 _!THISI SNDTACOHH | 
I  ENTBOTAtOHGCHARACTERSTPI1IGCONTAININGivERYCHARACTERINTHECRARSET_ITISVERYL01IGBOTST| 
I ILK256!  !abcdefghjijkl»nopqrstUTwxyz  C._  C.  I 

|  C.  A BCDEFGHIJKLB!inPQRSTnVWXYZ_01  2  3U56  789_S#J«r,* -♦="/><-. |?_()  .  ,  ;:•  _!THTSL ITERAL ALS | 
|OCONTAINSE?ERYCHARACTERTHTHECHAPACTERSET_C0HHE!IT  BUTITISH0CHLOIGERTHA1I256ANDW  ILL| 
| BETRONCATEDSOTHATITISLESSTHANT«OHONDREDPIPTYSIXCHARS;HOHEVERITGBTSINCREASI1IGLY#?| 
|  nFBICOlTTOBRITECBARACTERSTRIRGSTHATARESOVERYLO!IG[  abcdefghi  jklin  opqrstUTtrxyz?  ]t  \ 
I  C.THISISJOSTASBORTLITERALTHATHAPPERSTOEIIDIHCOLUHNaOl 

I  I 


Pigure  9 
Character  Literal  Test  Cases 


The  first  execution  of  this  test  was  literally  a  disaster, 
literals  that  w?re  contained  completely  on  one  line  and 
surrounded  by  blanks  were  handled  correctly,  but  strings 
containing  end_of _source_records  were  badly  wangled.   The  first 
recurring  end_of_source_record  ended  the  literal,  and,  if  •x*  was 
the  string  between  the  C.  and  the  first  en1_of _source_record,  the 
token  "recognized"  becaie  «xit«  (i.e.,  the  string  concatenated 
with  itself).   The  remainder  of  the  literal  was  interpreted  as  a 
seguence  of  new  tokens. 

The  sain  cause  of  these  problems  was  the  procedure  that  read 
in  a  new  line  of  source  text  and  the  conditions  this  procedure 
expected  to  be  true  when  inroked;   the  procedure  was  designed  to 
do  wore  processing  than  it  logically  should  have,  and  by  doing  so 
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thwarted  the  handling  of  character  literals.  These  problems  were 
repaired,  and  the  tests  were  reran.  This  tiie  all  literal_values 
were  correctly  isolated. 

3.7   Coiplete  Lexical  Analysis 

After  having  tested  each  portion  of  the  lexical  analysis 
individually ,  it  was  necessary  to  integrate  all  these  tests 
together  in  order  to  test  the  entire  lexical  analysis  phase.   The 
motivation  behind  the  design  of  the  composite  test  was  the 
following:   assume  all  tokens  in  the  input  stream  have  been 
correctly  processed  (i.e.,  as  predicted)  up  to  a  given  point; 
the  correct  isolation  of  the  next  token  now  depends  only  on  what 
follows  it.   (The  ^act  that  the  lexical  analyzer  is  a  procedure 
that  is  called  each  time  a  token  is  needed  for  the  parse  was  part 
of  the  basis  for  making  this  assumption.) 

The  first  step  in  the  design  of  this  test  was  to  generate  a 
list  of  specific  token  types  that  had  been  previously  tested 
individually.   Then  this  list  was  expanded  by  differentiating, 
within  each  token  type,  tokens  that  wight  have  distinct  ending 
characteristics  (e.g.,  identifiers  ending  in  a  letter  vs. 
identifiers  ending  in  a  digit) .   The  original  list  was  again 
expanded  bv  differentiating  on  beginning  characteristics  (e.g., 
flaciaal  constants  beginning  with  a  decimal  digit  vs.  those 
beginning  with  a  minum  sign)  .   Figure  10  shows  both  complete 
lists.   Character  literals  were  not  included  in  the  distinct 
endings  group  since  these  must  all  be  followed  by  a 
blank_character  which  is  not  considered  a  part  of  the  literal. 


DISTINCT  ENDINGS 
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DISTINCT    BEGINNINGS 


1  delimiter   character 

2  co«i«»nt:       COHHENT    ; 

3  cotnent: 

!  en1_of _soarce_record 

4  Identifier  endinq 

in  a  letter 

5  identifier  endinq 

in  a  diqit 

6  operator 

7  deciial  constant 


fl  hex  constant  endinq 
in  a  letter 

9  hex  constant  endinq 
in  a  diqit 

10  lona^integer  constant 

11  binacv  constant 

12  bit  constant 


13  end  of  source  record 


1  deliiiter   character 

2  consent:       CONHENT    ; 

3  conent: 

!  end_of _source_record 

«  identifier 


5  operator 

6  positive  deciial 

constant 

7  neqatiwe  deciial 

constant 

8  hex  constant 


9  lonq_integer  constant 

10  binary  constant 

11  bit  constant 

12  character  literal 

13  end  of  source  record 


Fiqure  10 
Token  Classes  nsed  In  Composite  Test 


A  matrix  was  constructed,  siiilar  to  that  used  in  the 
delimiter  test  in  section  3.2,  where  the  rows  were  numbered  with 
the  classes  of  tokens  possessinq  distinct  ending  characteristics, 
and  the  colnms  nuibered  with  the  classes  of  tokens  hawing 
distinct  beqinninq  characteristics.   (See  Piaure  11.)   The  idea 
here  is  to  insure  that  each  token  with  a  distinct  endinq  is 
followed  in  the  test  case  by  each  token  with  a  distinct 
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beginning.   Another  important  criterion  of  this  composite  test  is 
that  it  include  every  instance  from  the  sob-function  tests,  in 
order  to  preserve  the  "testedness"  of  the  sub-functions. 

The  actual  test  cases  were  then  qenerated,  filling  in  the 
matrix  as  follows:   the  entry  in  row  i  and  column  j  of  the  matrix 
received  the  value  'n/m*  when  in  the  test  data  a  token  of  type  1 
followed  a  token  o^  type  i  on  card  n,  and  the  first  character  of 
the  type  1  token  appeared  in  card-column  m.   For  example,  the 
entry  in  row  1  and  column  7  is  3/11;  this  indicates  that  an 
identifier  ending  in  a  letter  (IDentifier)  immediately  precedes  a 
negative  decimal  constant  (-65000),  where  the  minus  sign  is  the 
11th  character  on  line  3  of  the  test  data.   The  actual  test  data 
appear  in  Figure  12.   Tt  is  easy  to  verify  that  all  sub-function 
t=»sts  are  included  in  this  data. 

The  results  of  this  test  reaffirm  that  previous  repairs  had 
been  successful,  hut  brought  to  light  one  more  problem*   If  any 
nameric  constant  was  followed  immediately  by  a  negative  decimal 
constant  (or  an  operator  beginning  with  a  "-•»)  ,  the  two  were 
recognized  together  as  a  "constant"  (i.e.,  B. 0110-35);   when 
evaluation  was  then  attempted,  an  error  occurred  because  the 
interior  minus  was  not  a  digit.   Rerunning  the  test  showed  that 
the  first  repair  made  for  this  problem  was  incorrectly  performed, 
which  wreaked  havoc  everywhere.   The  repairs  themselves  were  than 
fixed,  and  a  re-execution  of  the  data  showed  that  indeed  the 
problem  was  finallv  solved. 
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4   Syntactic  analysis 

The  testin7  of  the  syntactic  analysis  phase  should  encompass 
the  design  of  t?st  lata  which  would  exercise  avery  legal 
production  rule  of  the  CLEOPATRA  grammar.   Unfortunately,  as 
discussed  in  Chapter  2,  it  was  virtually  impossible  to  exercise 
each  production  rule  in  every  valid  context.   Therefore,  the 
motivation  behind  the  design  of  the  test  data  was  simply  to 
exercise  each  legal  production  at  least  once;  in  cases  where 
there  were  several  productions  for  a  single  non-terminal  (e.g., 
the  wany  syntactically  different  ways  of  writing  an  ITERATE 
statement) ,  however,  the  productions  appeared  in  as  many 
different  contexts  in  the  test  data  as  was  feasible.   In 
addition,  sinrre  error-correction  facilities  are  very  liiited  in 
this  implementation,  illegal  constructs  were  generally  not 
tasted. 

Test  data  for  a  parser  usually  consist  of  syntactically 
correct  bat  semantic- free  program  segments.   Because  syntax  and 
semantics  were  implemented  concurrently  in  this  compiler,  the  use 
of  semantic- fre*  test  prograis  would  have  been  impractical  due  to 
♦■he  abundance  of  semantic  error  messages  that  would  have  been 
generated.   Furthermore,  the  use  of  semantically  meaningful  data 
in  this  phase  of  testing  facilitated  the  tests  for  semantic 
correctness,  as  reported  in  Chapter  ">. 

Two  main  programming  examples  were  chosen  as  the  basis  of 
♦■he  remaining  testing  effort.   One  is  a  very  skeletal  scanner 
based  somewhat  on  the  CLEOPATRA  language  itself;  it  is  not 
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intended  to  b?  complete,  or  even  correct,  bat  rather  is  intended 
♦•o  illustrate  low  one  light  approach  sach  a  task  using  CLEOPATRA. 
▼  he  second  eximole  shows  how  several  user-defined  data  types 
wiqht  be  manipulated.   The  presentation  of  these  examples  is 
divided  into  three  parts:  the  first  consists  of  the  structure  and 
data  blocks,  the  second  incorporates  routine  blocks  (including 
all  statements,  but  with  only  the  lost  rudimentary  expressions) , 
and  the  third  covers  expressions  in  detail. 

U.1   Structure  and  Data  Blocks 

This  section  reports  the  results  obtained  from  testing  four 
of  the  types  of  program  blocks  in  CLEOPATRA:  global  structure 
blocks,  local  structure  blocks,  global  data  blocks,  and  local 
data  blocks.   The  specification  production  rules  for  these  blocks 

are: 

(4.1)  basic_ref_type  ::=  IHTBGER  |  LONGINTEGBR  |  BIT  | 

CHARACTER 

(1.2)  basic_typ?    ::=    UTEGER    |    LONGIRTRGER    |     BIT    | 

CHARACTER  [ (expression) ] 
(1.1)      reftype  ::=  {  basic_ref_type  |  type_name  ) 

; integer  EXTENTS] 
(1.5)      typ=»  ::=  (  basic_type  |  type_name  }  [array] 
(5.1)      struct ure_block  ::=  STRUCTURE  conf iguration_name 

{  ;  link_item  )•  [;]  EHD  configuration_name  [;] 
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(5.2)      conf igurationnaae  ::=  procedare_naie  |  type_naae  I 

aperator^link 
(5.1)      linlc.iten  ::=  TYPE  type_na«e  [ALIAS  identifier]  | 

qlobal_link_itea 
(5.«)      qlobal_link_itei  ::=  PROCEDURE  procedure_naae 

[ALIAS  identifier]  [  ref_type_list  .] 

PETDRRS  basic_ref_type 
(5.5)      <Uobal_link_itei  ::=  operator_link  :  OPERATOR 

[ left_ref_types]  operator  [ALIAS  identifier] 

riq,ht_ref_types  RETURNS  basic_type 

(5.10)  ref _type_iist  ::=  (type_for«al  [ , type_f oraal ]•) 

(5.11)  type_formal  ::=  ref_type  TBI  ADDRESS] 

(5.12)  left_ref_types  ::=  ref_type  [ BY  ADDRESS  : 

[ ref_type_list ]  |  .  ref _type_list ] 

(5.13)  right_ref_typ*»s  ::=  ref_type  |  :  ref_type  BY  ADDRESS 

|  ref_type_li?t  {  .  ref_type  |  :  ref_type 

BY  ADDRESS  } 
(5.1U)     tjlobaL^stracture^hlock  ::-  GLOBAL  STROCTfIRE  type_naie 

C  ;  7lobal-lin(t_itet  ]•  [:]  EHD  type_naie  [;] 
(5.15)     *-ype_na«e  ::=  identifier 
(5.17)     pro~eiure_nap*»  ::=  identifier 
(5.2U)     operator_link  ::=  identifier 

(5.29)  data_block  ::=  DATA  conf iguration_naae 

{  ;  rcORSTMfT  |  DEFER  ]  data_group  )• 
];]  ERD  conf iguration_nane  [;] 

(5.30)  qloba^data^it-ea  :  :=  GLOBAL  DATA  conf iguration_naae 

f  :  [CORSTAWT  |  DEFER]  data_group} • 
[;]  ERD  configuration_naae  [;] 


33 


(5.31)  type_data_block  ::=  SLOBAL  DATA  type_nate 

f  ;  basic_type  [array]  itea  [,ite*]»  }• 
[;]  EWD  type_naae  [;] 

(5.32)  data_gronp  ::-  {  basic_type  |  type_naae  }  [array] 

ite*  [  ,itea]« 

:=  {  RIGHT  |  YBCTOR  }  (boand  [, bound]*) 
:=  f  *:•  |  [integer:]  integer  ] 
=  identifier  riWTT  constant] 


(5.34) 

amy 

(5.35) 

boon  d 

(5. 36) 

itea    : 

These  productions  are  taken  froa  Halbur  [9],  with  the  following 
nodif ications  froi  Fisher  [5]: 

1)  pointers  are  not  implemented,  so  they  hawe  been  deleted 

froa  prodactions  4,1  and  1.2; 

2)  types  and  ref_types  (prodactions  4.4  and  4.5)  iay  not 

have  ALIA?  naaes; 

3)  procedures  iay  return  basic_ref_types#  not  basic_types, 

as  specified  by  Halbur. 

In  generating  the  test  data,  every  fori  of  each  production 
was  used  at  least  once;  for  example,  production  5.13  has  four 
tons: 

right_ref_twpe  ::=  ref_type 

:  ref_type  BY  ADDRBSS 
r»f_type_list  .  ref_type 
r»f_type_list  :  ref_type  BY  ADDRESS 
and  each  one  ha3  been  exercised.   The  test  data  are  self- 
dDcuwentincr  in  that  internal  comments  indicate  what  production 
rules  hawe  been  used  to  generate  the  code.   Figure  13  shows  the 
initial  test  data. 
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STRUCTURE  SYNTAXTEST 

PROCEDURE  PURSER  RETURNS  LONGINTEGER 

PROCEDURE  HANIP  ALIAS  TTPBCHECK  RETURNS  LORGIRTEGER 

PROCEDURE  OOTPUT(CHARACTER) .  RETURNS  LORGIRTEGER 

END  SYNTAXTEST 


STRUCTURE  PARSER 

;  TTPE  SYHTABBNTRY 

;  PROCEDURE  SCANNER  RETURNS  INTEGER 
;  END  PARSER 

GLOBAL  DATA  S YRTABENTPY 

;  INTEGER  NAHETABPTR  INIT  0, 
TOKLENGTH  INTT  0, 
TOKTYPE   INIT  0, 
VALUE     INIT  0 
END  SYNTABENTRY 

GLOBAL  STRUCTURE  SYHTABENTRY 

;  SETENTRY:  OPERATOR  #=>  (INTEGER , INTEGER , INTEGER) . SYHTA BENTRY 
RETURNS  BIT 
END  SYHTABENTRY; 


GLORAL  DATA  PARSER 

CONSTANT  INTEGER  SYHTABHAI  INIT  100 
SYHTABENTRY  RIGHT(IOO)  SYRTAB 
CHARACTBR(I)  VECTOR  (1:1000)  NAHBTAB 
RI^  EOF  INIT  S.O 

;  END  PARSER 


•JT.OBAL  DATA  SCANNER 

CONSTANT  INTEGER  LINELIHIT  INIT  80 
INTEGER  CAPDNO  INIT  1,  CAPDPTR  INIT  1 
CHARACTER (80) CARD,  TOKEN  INTT  C. 
CHAR  ACTER(81)  TCARD 

END  SCANNER; 


STRUCTURE  SCANNER 

-.OROCED'JRE  LTNETN  ALIAS  GETLINE  (CHA  R  ACTER  BY  A DDRESS)  . RBTURNS  BIT 
;CONSTA"JTEVAI,:  OPERATOR  |  i  |  (INTEGER)  .  CHAR  ACTER  RETURNS  LONGINTEGER 
;PROCEDMRE  SCA NFRROR (INTEGER) .  RETURNS  BIT 

;LOOKnP:  OPERATOR  LOOKFOR (SYHTABENTRY  1  EXTENTS) .CHARACTER 
RETURNS  INTEGER 

;  FND  SCANNER: 

DATA  SCANNER 

;INTEGER  T;  CHAPACTEP(I)  TCHAR 
END  SCANNER; 

STRUCTURE  LIN^IN 

;PPOCEDURE  INPUT (CHA RACTER  BY  ADDRESS) . RETO PNS  LONGINTE1ER 
TRANSLATE:  OPERATOR  9$%    ALIAS  TRANS  (CHAR  ACTER,  CHAR  ACTER)  . 
CHARACTER  RETURNS  CHARACTER 

RND  LINRIN; 

OATA  LINRTN 

CHARACTER  (80)  TNPOTSTRING  INIT  C. 

;CONSTANT  CHARACTER(80)   INCHARS  INIT  C. ABCDEFGHI JELENOPQ RSTUViX YZ ( ) . : , : 
»#t%r,*-*  =  »?/<>|[  -,012  31^678  9! 

.TCHARS  INIT  C.  1 1 1  1  1 1  1 1 1 1  1 1 1 1 1 1  1 1 1 1 1 1 1 1 1 1 222 222 2 
233133333  1333  33  11 3am»U4U  HUH  U5 
RND  LINEIN 

DATA  CONSTANTEVAL 

;TNTEGER  RADIX  INIT  10,  SIGN  INIT  1,  LOB  INIT  1,L, DIGIT 

;LONRINTEGER  VALUE  INIT  F.O 

;CHAPACTER (80)  STRING 
; END  rTNSTANTEVAL 


Figure  13 
Structure  ann:  Data  Block  Test  Data 
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STRUCTURE  I1ANIP; 

TYPE  STACK  ALIOS  YOTOLIST; 
TYPE  F1ATRIX1BY3; 
TYPE  COMPLEX; 

TNNERPRODnCT:  OPERATOR  INTEGER  1  EXTENTS  .  (INTEGER ,  IHTES  ER) 
♦  *  ALIAS  DOTPRODnCT  ( INTEGER) . INTEGER  1  EXTENTS 
RETURNS  LONGINTEGER; 
PROCEDURE  PROCWTTR2PARHS (INTEGER, INTEGER) .  RETURNS  BIT; 
END  NANIP; 

DA""A  HANIP; 

STACK  STACK1,STACK2; 

COHPTEX  RTGHT(10,5)  CR1,CR2; 

MATRIX3BY3    M,H2 

;  INTEGER  INTEXPR 
END  MANTP; 

GLOBAL  DA^A  HANIP; 

INTEGER  VECTOR(B)  VEC1 , VEC2 , VKC3 
END  HANIP; 

GLOBAL  DATA  STACK; 

INTEGER  VECTOR (1: 100)  PDLTST; 

INTEGER  TOP  INTT  0; 
END  STACK; 

GLOBAL  STRUCTURE  STACK; 

POSH:  OPERATOR  INTEGER  POSH:  STACK  BY  ADDRESS  RETURNS  BIT; 

POP:  OPERATOR  INTEGER  BY  ADDRESS:  POP  STACK  RETURNS  BIT; 

INSPPCT:  OPERATOR  STACK.  (INTEGER)  ???  IHTEGBR  RETURNS  INTEGER; 
END  STACK; 

DA^A  PnSH; 

STACK  SI;  INTEGER  I; 
END  PUSH; 

DATA    pnp; 

STUCK    S1;     INTEGER    T; 
END    POP; 

T>\t\    INSPECT; 

INTEGER  Z,T,J;  STACK  S1 

END  INSPECT 

DATA  TNNEPPPODUCT; 

lONGIN^EGFR  PRODUC"  INIT  F.O; 
INTEGER  VECTOR  (*:*)  V1,V2; 
INTEGER  I,J,nl,L1,02 

END  TNNFPPPODUCT; 

GLOBAL  DATA  n A^P IX3BY7 ; 

INTEGER  VECTOR(3,3)  W1, EXTRA; 
END  MATRIX3BYT: 

GLOBAL  STRUCTURE  w A",RTX3BY3  ; 

flATRITOP:  OPERATOR  MATRIX3BY3  BY  ADDRESS :  (INTEGER)  DHALNULT 
(INTEGER)  :HATRTX3BY3  BY  ADDRESS  RETURNS  INTEGER 
END  NATRTX3RY3; 

GLOBAL  DATA  COHPLFX; 

INTEGER  TNTPART,I"AGPART 
END  COMPLEX; 

GLOBAL  STRUCTURE  COMPLEX; 

ABS:  OPERATOR  ABS  COMPLEX  RETURNS  INTEGER; 
END  COMPLEX; 


Piaur*3  13  Continued 
Struc»-.aro  and  Data  Block  Test  Data 
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The  execution  of  these  test  data  exposed  several 
discrepancies  between  the  language  described  by  the  BMP  and  the 
language  accepted  by  the  parser.   These  discrepancies  and  their 
descriptions  are  given  below. 

1)   Tn  production  5.1  the  seaicolon  after  the  "Bin 

confi}uration_name"  is  not  optional,  as  implied  by  tha 
BHF.   When  omitted,  an  error  message  is  issued  and 
nonal  processing  continues. 

?.)       The  operator  of  production  5.5  cannot  be  the  same  as 
the  operator_link.   rf  they  are  identical,  an  error 
message  claiming  "duplicate  declaration"  is  issued, 
and  the  na»e  is  no  longer  recognized  as  being  an 
operator_H  nlc.   This  causes  additional  problems  later 
in  the  program  when  that  operators  definition  blocks 
are  encountered. 

3)   There  are  several  problems  with  production  5.12. 
Pirst,  the  ref_type_list  is  not  optional  after  "BY 
ADDRESS:".   if  it  does  not  appear,  every  token  up  to 
and  including  the  next  seaicolon  is  ignored,  the  tokea 
following  the  semicolon  is  assumed  to  be  the  operator 
of  production  5.5,  and  processing  continues  from  thera 
as  if  no  error  had  occurred.   Tnevitably,  more  probleis 
arise  as  a  result  of  the  "correction."   Second,  it 
appears  that  the  ref_type  in  production  5.12  cannot 
include  an  ETTBWTS  phrase.   When  one  is  included,  it 
causes  the  next  n  tokens  declared  (where  n  is  an 
unprelictable  number,  usually  on  the  order  of  five)  all 
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to  be  mapped  into  the  same  symbol  table  location  as  the 
integer  extent  naiber.   The  most  serious  side-effect  sf 
this  1s  that  if  the  programmer  is  unfortunate  enough  to 
have  included  an  array  declaration  later  in  the 
proorram,  the  confusion  in  the  symbol  table  causes  an 
infinite  loop  in  the  array  cleanup  routine. 
t»)   An  interesting  deviation  involving  production  5.36  is 
that  if  it  appears  in  any  context  other  than  COWSTAWT, 
the  identifier  is  not  restricted  to  being  initialize3 
to  a  constant  value;  it  may  be  initialized  to  any 
previously  declared  token-name  (e.g.,  a  procedure  name 
or  a  punctuation  mark) .   Only  in  the  CONSTANT  context 
is  any  type-checking  done. 

with  the  exception  of  the  EXTENTS  phrase  problem  in 
prodtictiDn  5.12  (the  cause  of  which  is  still  unclear),  proposals 
fDr  the  repair  of  these  inconsistencies  have  been  documented,  but 
not  yet  implemented.   However,  the  restrictions  imposed  by  these 
errors  are  not  Dverly  confining  in  view  of  the  more  serious 
problems  encountered  in  routine  blocks,  as  described  in  the  next 
section . 

4.2   Pontine  Blocks 

Pontine  blocks  in  CLEOPATPA  define  the  actual  operations  to 
be  carri^i  out  by  a  procedure  or  operator.  The  production  rules 
f*>r  these  blocks  (including  expressions)  are: 
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(5.16)     routine_block  ::=  PROCEDURE  procedure_na«e 

rnaie_list  »]    f  ;  statements  }•  [;]  END 
procedure_naae  [ ;  ] 

(5.18)  naae_list  ::=  (foraal  [,foraal]«) 

(5.19)  fonai  ::=  Identifier 

(5.20)  nroceiure_call  ::  =  procedure_na«e  [  paraaenters] 

(5.21)  pa«a»ters  ::=  (actual  [, actual]*) 

(5.22)  actaal  ::=  expression 

(5.23)  routine_block  ::=  operator_link  :  OPERATOR  [ left_naies ] 

operator  right_na«es  {  ;  statement  )•  [;]  EM 
operator_link  [  ;  } 

(5.25)  left_naaes  ::=  ref_type  identifier 

r  (  :  |  .  }    na»e_list  |  :  ] 

(5.26)  riqht_nanes  ::=  [;]  ref_type  identifier  |  naae_list 

{  :  I  .  }  ref_type  identifier 

(5.27)  operator_cali  ::=  (expression  f  {  :  I  .  )  [paraisl]]] 

operator  [  fparaas2]  (  s  I  •  1]  expression 
(5.27a)    paransl  ::=  (actual  [ractual}«* 
(5.27b)    params2  ::=  •actual  (,actuall«) 

(6.1)  expression  ::=  constant  f  systea_supplied_value  | 

syst»a_supplied_constant  |  array_ref erence  f 
procedure_call  |  operator_call  |  (expression) 

(6.2)  array_reference  ::=  array_expression 

(6.3)  array_expression  : :=  identifier 

(7.1)  stat  ::=  expression  |  MIL 

(7.2)  stat  ::=  RETURN  expression 


39 


(7.7) 

lstmt 

• 

(7.8) 

label 

• 
• 

(7.9) 

cstat 

• 
* 

(7.10) 

CStBt 

• 

(7.3)      stmt  ::=  EXIT  [label] 
(7.U)      stmt  ::=  IP  expression  THEN  statement 
[  [;]  ELSE  stateaent] 

*  cstit  f  label  :  cstat  label 
=  identifier 

=  BEGIN  stateaent  [;  stateaent]*  [;]  END 
=  [for_phrase]  ITERATE  [ while_phrase] 
statement  [;  statement]*  [;]  [ when_phrase ]  END 

(7.11)  for_phrase  ::  =  FOR  identifier  [  FRON  expression] 

step  expression  [[;]  f  DPTO  |  DONNTO  )    expr  ]  ; 

(7.12)  while_phrase  ::=  WHILE  expression  ; 

(7.13)  when_phrase  ::=  RHEN  expression  [;] 
(7.1U)     cstat  ::=  DECISION  decision  f;  decision]* 

*;]  ACTION  action  [;  action]* 
[;]  r  FLSE  stateaent  [;]  ]  END 

(7.15)  decision  ::=  switch  :  expression 

(7.16)  decision  ::=  switch  {, switch]*  [ list_layout ]  : 

iist_inflex 

(7.17)  switch  ::=  identifier 

(7.18)  list_layout  ::=  VECTOR  (inteqer  :  integer) 

(7.19)  list_index  ::=  expression 

(7.20)  action  ::=  switch_expression  :  stateaent 

(7.21)  switch_expression  ::=  [-•]  switch  [  switch_operator  [-•] 

switch]* 

(7.22)  switch_operator  ::=  &  |  ]_  |  ==  |  -.=  |  AND  |  OR 

(7.23)  stateaent  ::=  st.it  |  lstat 


no 


These  productions  are  again  based  on  Halbar  [9],  but  reflect 
the  following  modifications  introduced  by  Fisher  [ 5  ]: 

1)  parentheses  and  quotes  are  used  in  production  5,27  to 
disambiguate  parameter  lists; 

2)  ALTDCATE  an3  RELEASE  statements  have  been  deleted;   and 

3)  a  revision  of  the  POR  phrase  of  the  ITERATE  statement 
vas  made. 


'''he  initial  test  program  for  these  productions  is  similar  to 
that  shown  in  Pioure  14.   The  routine  blocks  were  incorporated 
into  a  revised  (i.e.,  werror"-f ree)  version  of  this  data,  and 
expressions  mere  replaced  with  single  identifiers. 

Initial  attempts  at  execution  of  these  test  programs  were 
disastrous.   A  significant  number  of  serious  compiler  errors  were 
found.   The  following  is  a  description  of  these  errors; 

1)  The  dot  after  the  name_list  in  production  5.16  is  not 
accepted  at  all.  If  it  appears,  it  is  ignored  and  an 
error  message  is  issued. 

2)  Productions  5.25  and  5,26  are  implemented  as  if  all 
occurences  of  ref_type  were  replaced  with 
basic_ref_type.   This  means  that  the  operands  of  a 
user-defined  operator  cannot  be  arrays  or  user-definei 
types,  even  though  these  may  appear  in  the  original 
declaration  of  the  operator.   If  a  user-type  name  or  in 
extent  integer  does  appear  as  a  left_name,  the 
*ype_name  or  integer,  respectively,  is  assumed  to  be 
the  operator,  processing  continues  expecting 
ricrht_names,  and  more  errors  result.   If  either  of 


«M 


these  entities  appears  in  the  right_names,  an  error 
message  is  issued,  and  the  remainder  of  the  right_naaas 
is  ignored  with  no  other  apparent  side-effect. 

3)      Tf  at  any  time  daring  the  processing  of  the  statement 
lists  of  productions  5.16,  5.23,  7.9  or  7.10  a 
statement  is  encoantered  vhich  begins  with  any  token 
whose  symbol-table  number  is  greater  than  or  equal  to 
18  and  less  than  66  (this  includes  most  reserved  words 
and  all  delimiters,  see  Halbur  [ 4  ]) ,  the  parser  enters 
an  infinite  loop.   This  trap  is  particularly  easy  to 
fall  into  since  the  letters  B,  C,  F,  5  and  I  are 
considered  reserved  words,  but  no  warning  is  given  if 
they  are  declared  as  identifiers. 

a)   When  a  compound  statement  is  labelled  (via  production 
7.7)  »rror  messages  are  issued  claiming  that  the  label 
is  an  undeclared  identifier.   The  identifier  is  later 
recognized  as  a  label  when  the  following  colon  is 
found,  but  because  of  the  error  messages  the  program 
can  never  be  passed  on  to  the  code  generator  and 
therefore  can  never  be  executed. 

5)   The  P3R  phrase  of  production  7.11  is  not  implemented  as 
Fisher  claimed.   The  form  that  is  implemented  is  a 
compromise  between  the  Fisher  and  Halbur  versions: 
for_phrase  ::=  POR  identifier  [  FROH  expression  ] 
f  f  nowRTO  \    opto  }  expression  [:]  I 
STEP  expression  [ ; ]  [ {  DOWNTO  |  0PTO  } 
expression  [  ;  ]  ]  } 
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6)  A  decision  statement  (production  7.1&)  cannot  be 
labelled  since  it  is  not  considered  a  compound 
statement  in  the  implementation. 

7)  Switch  expressions  (prodation  7.21)  1o  not  work  for 
several  reasons.   First,  switch  identifiers  (production 
7.17)  are  given  the  type  "unknown  identifier"  in  the 
symbol  table  instead  of  being  of  the  type  bit  as  one 
might  expect.   Second,  the  switch  operators  (production 
7.22)  B  and  }  are  not  predefined  system  operators,  so 
they  tay  not  be  used  at  all,  and  the  other 
switch_operators  understandably  do  not  accept  unknown 
identifiers  as  operands.   Even  if  the  switches  were  of 
type  bit,  the  wrong  type  result  would  be  returned. 

with  the  exception  of  errors  3  and  6,  repairs  for  these 
discrepencies  have  been  documented  but  not  yet  incorporated  into 
the  current  version  of  the  implementation.   Since  the  FOR  phrase 
as  described  in  fi  is  closer  to  the  design  goals  discussed  in 
Fisher  [5]  than  the  specified  form,  it  was  decided  to  leave  it  is 
is.   Frror  3  involves  fairly  far-reaching  side  effects,  all  of 
which  have  not  vet  been  tracked  down. 

4. 3   Expressions 

It  would  appear  to  the  casual  observer  that  the  setions  of  " 
this  chapter  so  far  have  described  errors  of  increasing  number 
and  gravity.   In  this  section,  unfortunately,  the  trend  is 
accelerated.   To  suisarize  briefly,  expressions  in  this 
implementation  simply  do  not  work. 
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The  specification  productions  for  expressions  were  listed  In 
section  4.2  and  describe  expressions  whose  operators  are 
associated  froa  right  to  left,  along  with  leans  for  calling  on 
user-defined  operators  and  procedures.   For  the  initial  test  data 
attempted,  expressions  were  reincorporated  into  the  routine 
blocks  of  Figure  1*.    The  errors  uncovered  by  this  experiment 
are  given  below. 

1)  The  system-supplied  substring  operator  <-  can  never  be 
found  in  the  syabol  table  because  of  incorrectly 
initialized  backets  and  bucket-links  used  by  the  lookup 
routine.   wore  specifically,  these  links  contain  two 
different  pointers  to  the  token  "  (concatenate)  but 
none  to  <-. 

2)  Bany  of  the  systea-supplied  operators  return  the  wrong 
type  result.   In  particular,  the  coaparison  operators 
1°.  12t  return  bit  values;   instead,  they  return  a 
result  of  the  type  of  the  two  operands  being  coapared. 
For  example,  T>J  returns  an  integer  result  if  I  and  J 
are  of  type  intecrer,  a  character  result  if  I  and  J  are 
of  type  character,  etc.   LBOOND  returns  a  result  of 
type  error,  and  HBOUND,  LENGTH,  "->  (index)  and  ?-> 
(verify)  all  return  integer  results.   LINT  and  CHAR 
return  character  and  longinteger  results,  respectively, 
instead  of  the  reverse.   also,  several  operators  do  not. 
accept  the  specified  type  of  operands,  notably  LENGTH, 
HFonND,  ->  and  <-.   Figure  15  shows  a  saaple  prograa 
illustrating  these  points.   (No  error  aessages  were 
issued  except  in  the  two  places  noted  in-line!) 
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structure  syntaitbst 

procedure  purser  returns  longinteger 

procedure  nantp  alias  typecreck  returis  loigtiteger 

procedure  ontpot(craractrf) .  returns  longintegef 

end  syntaxtest 


STRUCTURE  PARSER 

;  TYPE  SYRTabeNTRY 

:  PROCEDURE  SCANNER  RETURNS  INTEGER 
;  END  PARSER 

GLOBAL  DATA  SYNTABBNTRY 

;  INTEGER  NANETABPTR  TWIT  0, 
TONLENGTH   TNIT  0, 
TOKTTPE     INIT  0, 
VALUE       IlfIT  0 

END  SYNTABBNTRY 


3LOBAL  STRUCTURE  STWABPRTRT 

;  SETENTRY:  OPERATOR  #  =  >  (INTEGER,  INTEGER,  INTEGER)  .  SYNTABBNTRY 
RETURNS  BIT 
END  SYNTABBNTRY; 

GLOBAL  DATA  PARSER 

CONSTANT  INTEGER  STHTAFHIX  INIT  100 
SYNTABBNTRY  RIGHT(100)  SYNTAB 
CHARACTBR(I)  VICTOR  (1:1000)  NAHETAB 
BT*  Elf  INIT  S.O 
END  PARSER 


PROCEDURE  PARSER 

;  ITERATE  SCANNEP  WREN  BOP  = 
ENO  PAPSER 


=  S. 1  END 


GLOBAL  DATA  SCANNER 

CONSTANT  INTEGER  TTNBLIHIT  INIT  90 
INTEGER  CARDNO  INIT  1,  CAPDPTP  INIT  1 
CHARACTER  (10)  CARD,  *OEEN  INIT  C. 
rHAR A?trp («1j  tcaRD 

END  SCANNER; 


STRUCTURE  SCANNER 

:PROCED(TPE  LTN»TN  ALIAS  GETI  TNE  (CHA  R  ACT  BR  BY  A  DDRESS)  .  R"!  TtJRNS  BIT 
;CONSTANTEVAL:  OPERATOR  |  # |  (INTEGER)  . CHAR  A ~TER  RETURNS  LONGT*  T  ESER 
:PP0CED0RE  STANPRROR (INTEGEP) .  RETURNS  BIT 

;L">0E1P:  OPrpjTnR  LOOKBOR (SYNTABENTRY  1  EXTENTS) . CHA P ACTER 
RETURNS  TNTEGFP 

;  END  SCANNER: 

DATA  SCANNER 

;INTEGER  I;  CHARACTER (1)  TCHAP 
END  SCANNER; 

PROCEDURE    SCANNER 

;  RETTONEN:  ITERATE  WHILE  TRnE: 

OBCISION  LETTER, DELTNITER, SPECIAL, DIGIT, "ONTPOL,EOSR  : 

3HAR  1  ->  TCARD  <-  CAPDPTR; 
ACTION; 

LETTER:  BEGIN  POR  I  EPON  CARDPTRO  OPTO  LINELIHIT; 
ITERATE  TOKEN:=TOEBN"1->CARD<-I  -  1; 

TCHAP  :=  1->TCARD<-I; 
WHEN  (TCHAP-*=C1  )  AND  TCRAP-.=C.  *  END; 
PETORN  LOOKPOR'SYNTAB) . TOKEN 
END; 
LETTER  AND  DIGTT:  Nil;  !IEPOSSTBLE  CASE 
DELTNTTBP:  mr  EN :  *1  ->CARD<- 1  ; 

CONTROL  OP  EOSR:  IE  LINBIN(CARD)  THEN  SCANERROR(I)  ELSE  EITT; 
DIGIT:  'CHAP  :=  ".«  ; 
SPPCTAL:  "*HAP  :=  P.  T  ; 


Pigure  1* 
Routine  Block  Test  Data 
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DIGIT  OR  SPECIAL  :  BEGIR  TORE!  :» 

(CARDPTR:*TCRtR  ?->  TCARD  <- 
CAPDPTP»1)  ->  CARD  )  <-  3ARDPTP; 
RETORR  LOORFOR'STRTAB) .TDKRR; 

ERD 
PI.SE  SCARRRR"*RC?)  •RRDEPTRED  CHARACTER 
BRD  'DECISIS 
ERD  GBTTORBB 
BRD  SCAKRPR 

STROCTORE  LIRPTH 

;PROCEDfTPE  TRPnT(CHAPACTER  BT  ADDBBSSJ  .  RETDBWS  LORGIRTE3EF 
;TRARSLATE:  OPERATOR  »S%    ALIAS  TBIRS (CHARACTER, CHARACTBR) . 
CHARACTER  RPTORRS  CHARACTER 

ERD  LTRBIR; 

DATA  LIHBTB 

;CHARACi*ER  (90)  IRPOTSTRTNG  TRTT  C. 

;CORSTMT  CHARACTER  (80)  IRCRARS  IRTT  C.  ABCDEPGRIJKLHROPJRSTOVilTZ  ().:,;  • 
»*$«£•-♦ ="?/<>|  [-.0 12  3*56789! 

,TCHAPS  IRIT  C. 111111111111111111111111112222222 
23  3  33333  33  33  33  33 330<UH»«««HHHl 5 
BRD  LIREIR 

PROCEDURE  LIRErR(TH?D*STRTRG) . ; 

TP  IRPUT (TRPOTSTRIRS) ==P.O 
▼HER  RBTURR  PALSE; 
ELSE  BEGTH  CARDHO  :=  CAPDR3H; 
CARDPTR  :=  1; 

OUTPUT ((CHAR  CARDRO) "C. |   "IHPOTSTRIRG"C.  |  ; 
TCARO  :=  TRARS'IRCH»RS,TCRARS)  .CARD"C6  ;~ 
RP^OPR  troe 
BRD; 
BRD  LTRPTR; 

DATA  CDRSTANrpf AL 

;TRTE3FP  RADII  IRTT  10,  SIGH  IRIT  1,  LOR  IRIT  1,L, DIGIT 

;LONGIN*EGER  VALUE  IRTT  P.O 

;CRARACTPR  (80)  STRTRG 
;FRD  CORSTAR^ETAL 

rONSTART"?AL:OPPRAT">R  \  t|  (RADTT)  .CHARACTEP  STRTRG 
:  TP  "..-    --     (I:=1|  ->STRIRG 
rRER  BEGTH  SIGB  :=  -1;  T:=2  PRO; 
PDF  I  nPTO  LPRGTH  STRTRG  TTEPATB 

DIGIT  :=  (  1  ->  STRTRG  <-  I)  "->  C. 0 1 21056789ABCDBF  ; 

TP  nrGTT  -.=  IRTBGER  RIL 

THE*  7AIUP  :=  (DIGIT  -  1)  «-PADTf •▼ALOE 

BI.SP  RETORR  P.O 
ERD; 

PRTORR  SIGR*VALUE 
ERD  CORSTARTETAL 

STRUCTURE  HARIP; 

TT»E  STACR  ALTAS  TOTOLTST; 
"TPE  NArRHr3PT3; 

TYPP  CORPLE*; 

TRRERPRDDOCT:  OPERATOR  IRTEGEP  1  EfTERTS  . (IRTEGER, IBTESEt) 
«•*  ALTAS  DOTPPODttCT  (TRTPGER)  .  IRTPGER  1  EITERTS 
RPTURRS  T.ORGIRTBGEP; 
PROCEDURE  ORDCRITR7PARI1S  (IRTBGER , THTEGER)  .  RPTORRS  BIT; 
ERD  HARIP; 

DA^A  RAHIP; 

STACK  STACR1,STACK2; 

COHPLPT  RTGR,,(10,S)  CR1,CP2; 

"ATRTT3«T3  R1,R2 

;  IRTErtPR  TR^RXPP 
ERD  HARIP: 


Pigare  1*  Continued 
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GLOBAL  DA'A  NANTP; 

INTPGPR  VPCTORfR)  VEC1,¥EC2,VEC3 
END  HHWTP: 

PROCEDORE  HANTP; 

NONSPNSP:  begin 

PROCWTTH2PARWS (S.O,B. 01001)  ; 

PPC1.  (U,  B  •  ♦  *  •  ?)   .  ?EC7; 

DEPISTOi  sw»,S»S,SW*-  VECT0R(U:f>)  :  INTEXPP; 

SS1,«»2  :  rvTFXPR; 

SiO  :  PILSE 

-.SHO  ==  SNtt  :  EITT  HORSEHSE; 

SRS  ,=  SH7  UNn  s?1  :  RETORH  F.21»8«6 

PLS*  REWORK  F.O 
"ND 

END  NONSPNSE 
END  HANTP 

"5L0BAL  DATA  STUCK: 

INTEGPR  VECTOR (1: 100)  PDLIST; 

INTEGER  'OP  INTT  0; 
FID  STACK; 

"LOBAL  STROCTORP  STACK: 

POSH:  OPERATOR  INTEGER  POSH:  STACK  BT  ADDRESS  RETTIRNS  BIT; 
POP:  TPERATOR  TBT»-ur  By  ADDRPSS:  POP  STACK  RETORRS  HIT; 

mSPECT:  OPERATOR  STACK.  (IRTEGEP)  ???  TRTEGER  RETORRS  TNTBGER; 
PND  STACK; 

DATA  posh; 

STACK  51;  TRTEGF"  T; 

"so  posh: 

POSH:  OPERATOR  INTEGER  I  °flSRI  :  STACK  SI  : 
DECISION  OVERFLOW  :  100  <=  T0P.S1 
AC'TON;  -.OVERFLOW  :  BFGIN 

pdi st.si (TOP.si:*rop.si*i)  :=i: 

RETORR  TROB; 
END; 
OVERFLOW  :  RFTORN  FALSE  : 
END 
PND  POSH 

DATA  P^P; 

STAC*  SI:  TWEGEW  I; 
PND  POP; 

POP:  OPPPATOR  TN^PGPR  T  :  POP!  STACK  51; 
DECISTON  UNDERFLOW:  TOP.S1  <=    1 
ACTON;  ONDERPT.OH:  RETORN  FALSE; 
ELSE  BEGIN 

I  :=  PDLIST.S1 (TOP.S1)  ; 
TOP.S1  :=  'OP. SI-  1; 
RETORN  TROE; 
END: 
END  POP; 

DATA  INSPPCT; 

TNTBGPR  2,I,.T;  STACK  S1 
END  INSPPCT 

r"SPECT:  OPERATOR  STA^K  SI.  (I)   ???  TNTPGBR  T; 

"OR    3    »<»OH    TOP. SI     STEP    -1    T'FPATB    RHILF    -1<=I; 

XV    X  ==  PPLIST.SI(J)  THEN  FPTPPR  J 

PNO; 

PPT0»N  0 
PND  INSPECT 


Fiqure  1U  Continued 
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DATA  IRRERPRODUCT; 

LORGIWTGFR  PRODUCT  TRIT  T.  0 ; 

IRTEGER  VECTOR  (*:*)  T1,7?; 

IRTEGER  I,J,U1,L1,U2 
ERD  IRRERPRODUCT; 

IRRERPPOOnCT:  OPERATOR  IRTEGER  1  EfTERTS  T1.{Ol,L1|  ♦  * 
(02). IRTEGER  1  EXTBRTS  »2; 

J:  =  1  ♦  1  HROURD  72; 

EOR  T  EROE  1  HBOURD  V1  D0RBT0  1  LBODRD  T1 

ITERATE  PRODUCT  :=  PR0DUCT*T1 (T) *V2 (J: =J-  1) 

ERT>; 

PETnRR  PRODUCT 
ERD  IRPERPRODDCT; 

3L0BAL  DATA  RATPTX3BT3; 

IRTEGER    fECTOR(3,3|     R1, EXTRA; 
ERD    EATRII3BT3: 

3L0BAL  STRUCTURE  RATRTX3BY3; 

NATRIXOP:  OPERATOR  RATRIX3BY3  BT  ADDRESS: (IRTEGER)  DUAL1DLT 
(INTEGER) :EATPIX3BY3  BT  1DDRESS  RETURNS  IRTEGER 
ERD  EATRIX3BT3; 

3L0RAL  DATA  CORPLEX; 

TRTEGEP  TITPAPTrIEAGPART 
ERD  CORPLEX; 

3L0BAL  STRUCTURE  CORPLEX; 

IBS:  OPERATOR  UBS  CORPLEX  RETURRS  IRTEGER; 
BRD  CORPLEX; 


Figure  1U  Continued 
Routine  Block  Test  Da*-a 
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STRUCTURE  EXPR 

;  PPOrEDORP,  TUTP<1T  (CHARACTER)  .RBTORRS  LORGTRTEGER 
ERD  EXPR; 


OATA  EXPR 


TRTRGER  T,J,* 

tORGIHTPGRR  R,T,7 

BIT  B1,B2,B3 

THTRGEP  VBCTDRflO, 100)  ¥1,Y2 

CHARACTER^)  C1,C2,C3 


ERD  EXPR; 


PPOCEDBRE  BTOR; 
J  :=  -  T 
K  :=  r  * 
K  :=  (  I 
B1  :=  B1 
K 

T 


♦  ABS  J  ;  !  80, 66, 68, 67 

3  //   K  ♦  1  j        !  69, 70, 71, 8* 

:»■  T-  •  1  |  *  J  :=  J  //  2  ; 
AMD  B?  1R  -,B3;       !  70,76,120,77 


(J:  =T  *oo  f»l  **J; 

2  LPOtJRP  71  ; 


r 

T 
I 

T 
T 

r 

C2 


=  J  =  =3;  • 
■  J-=3; 

=  ,7>3;  ! 

=  1<3;  • 

=  J>=3;  ! 

=  J<=3;  ! 

:=  LERGTR  CI: 


78 
79 
80 
81 
82 
83 


C.  1  HBOORD  n 


C3 

C3 

C3 

C3 

C2 

V 

T 

9 

7 

Z 

V 

B1 

CI 

R 

P1 

n 

W 

C1 
R 


:=  C1  ->  C2; 

:=  ci  <-   C2: 
:=  CI  "->  C2; 
:=  CI  ?->  C2; 

:=  I.TNT  ?1; 

=  CHAR  9; 

=  ABS  -  »: 

=  R*r*7//P.2; 

*  i*»T; 

=  i  BOD  r: 

=  r==F.i*: 

:=  p2==R3; 

:=  c?  ; 
:=  T-=7: 

:=  B?  -.=  B3: 

:=  C2  -.»  CI; 
:=  r>7; 

:=  C2>C3: 
:  =  T<Z; 


ci  :=  C2<ri; 
v  :=  Y>=7: 

CI  :=  C2>=f3; 

W:=Y<=7; 

C1:=C2<=r3; 

ci  : =  ci  "  CI 

PRO  ETPR; 
( 


!  73,72,80 
!  75,  SHOOLD  GET  EBBOR  BESSAGE  OR  := 
!  STRCE  tBOORD  RBTORRS  "ERROR"  TTPE 


86 

87 

88 

89  SHOOLD  GET  ERROR  RBSSAGF  OR  <• 

90 

91 

93,119 

90,121 

15,96 

97,98,99,100 

101 

102 

103 

100 

105 

106 

108 

109 

111 

112 

113 

110 

115 

116 

117 

118 

9? 


Fiqare    15 
fTnasaal    Expressions   Accepted 
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3)   Productions  5.27  and  5.27b  are  not  accepted  for  unary 
operators.   For  example, 

LOORFOR*  SYHTaB)  .  TOKEH 
is  not  accepted  on  the  grounds  of  aisaatched 
parentheses.   On  the  assumption  that  the  specifications 
•era  not  coaplete  (i.e.,  unary  operators  were  perhaps 
consllered  a  special  case),  the  following  was  tried: 

LOnRFORf'SYHTlB) .TOKEH  . 
This  caused  the  parser  to  enter  a  loop  that  vaccilate^l 
between  the  guote  and  soie  other  unspecified  token, 
printing  repeatedly 

Tlleeral  or  inactive  token.   Bad  token  was  • 
Illegal  or  inactive  token.   Bad  token  was 
Illegal  or  inactive  token.   Bad  token  was  • 
until  a  aessage  was  issued  claiming  the  stateaent  was 
too  long.   This  operator  call  was  tried  one  last  tiae 
deleting  the  quote  altogether,  but  the  parser  objectel 
to  the  dot. 

There  are  aany  more  errors  in  the  processing  of  expressions, 
too  nuaerous  to  list  and  too  nebulous  to  describe  intelligently, 
nnf ortunately,  most  of  these  cause  fatal  TBH/360  errors  which 
aade  further  testing  fruitless. 
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5   Seiantic  Analysis 

As  stated  before,  it  was  difficult  to  separate  pure  syntax 
from  semantics  since  these  were  implemented  concurrently. 
Faction  i»  covered  type-checkinq  and  some  other  semantics  alonq 
with  syntax.   This  section  is  concerned  with  the  correctness  of 
tha  symbol  table  complex  and  the  intermediate  text. 

*>.1   Symbol  Table  Complex 

The  malor  lata  base  for  the  compiler  is  the  symbol  table  and 
includes  the  name  table,  the  hash  table,  the  type  analysis  table 
(♦■oken  attributes  and  some  scope  information)  ,  the  constant 
♦  ablr»,  the  level/conficruration  table  (activation  information), 
the  conf icruration  table,  and  the  typp  table  (parameter 
information) .   These  tables  are  first  inspected  for  the 
correctness  of  the  system  predefined  symbol  entries  and  then  for 
oser-defined  entries. 

5.1,1   Predefined  Symbols 

The  system's  predefined  symbols  are  locations  1  through  122, 
and  all  of  these  entries  in  all  the  tables  listed  above  ar«» 
initialized  via  the  PL/I  TWTT  option  btfore  the  compilation  of 
tha  user  proaraa. 

The  hash  table  is  incorrectly  initialized  as  discussed  in 
section  1.  3:  the  »ntry  <-  has  no  pointer  to  it,  while  "  has  two 
pointers,  one  of  which  is  never  used.   Unfortunately,  the  repair 
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Is  not  as  simple  as  changing  the  unused  "-pointer  to  point  to  <- ; 
the  repair  is  documented  elsewhere.   Ill  other  entries  are 
correct. 

The  type  field  (giving  the  type  returned  or  the  type  of  the 
identifier)  of  the  type  analysis  table  is  also  incorrectly 
initialized,  as  discussed  in  section  4.3.   Entries  78  through  83 
and  103  through  118,  the  comparison  operators,  should  contain  8 
indicating  a  bit  value  is  to  be  returned.   The  returned  types  of 
LINT  and  CHAR  are  correctly  initialized  to  longinteger  and 
character  respectively,  but  in  actuality  they  return  the  reverse 
of  this.   (It  is  unclear  why  this  is  the  case.)   LENGTH,  HBO0ND, 
"->  and  ?•>  are  initialized  to  return  character  results,  but  all 
of  thesa  should  return  integer  results.   All  other  entries  of  the 
type  analysis  table  are  correct. 

The  type  table  field  defining  the  types  of  procedure 
parameters  and  operator  operands  also  has  soae  incorrectly 
initialized  entries.   The  operators  HB00MD,  ->  and  <-  are  all  sat 
to  accept  character  operands  while  the  first  should  take  an 
integer  operand  and  the  last  two  should  take  an  integer  and  a 
character  operand  and  the  reverse,  respectively.   Again  the 
entries  of  LIU*  and  CHAR  appear  to  be  reversed.   All  other 
entries  are  correct. 

The  name  table,  the  constant  table,  the  configuration  tble 
and  the  level/configuration  table  are  correctly  initialized  for 
the  predefined  symbols. 
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5.1.2   nser-defined  Symbols 

Information  about  user-defined  symbols  is  added  to  the 
symbol  table  as  it  is  determined  from  the  syntax.   For  symbols 
which  occur  in  any  error  context,  the  information  in  its  entry 
may  not  be  complete  or  correct,  which  is  understandable. 
However,  in  some  cases  *hese  errors  can  have  adverse  side-effects 
on  other  entries  in  the  symbol  table  by  overwriting  them. 

For  instance,  the  token  :=>  was  mistakenly  entered  as  an 
operator.   In  the  operator  definition  in  the  program,  the  symbol 
table  lookup  routine  was  in  declare-token  mode  when  it  reached 
♦■ha  :=>,  correctly  isolated  it  as  the  special  svmbol  :  =  ,  and 
proceeded  to  change  that  entry  to  be  a  unary  operator,  which  :=> 
was  supposed  to  be.   Thon  when  the  parser  reached  the  >,  it 
correctlv  issues  an  error  messaae  claiming  an  unexpected  token, 
but  entered  the  r*»*urn  type  of  error  in  th?  :=  symbol  table 
location  (and  the  one  before  it  was  chanqe<1  also,  since 
configurations  use  two  consecutive  rows  of  the  symbol  table). 
Tha  reason  this  is  mentioned  is  that  if  later  in  the  procrram 
♦■here  are  integer  assignment  statements,  much  confusion  will 
result.   The  point  is  that  the  predefined  symbol  entries  shouli 
have  a  higher  degree  of  protection. 

Wi«-h  the  exception  of  the  problem  described  in  section  1.1 
which  caused  several  different  tokens  to  be  placed  in  the  same 
location  of  thp  symbol  table,  thus  making  more  than  one  hash  link 
point  to  that  location,  no  other  errors  ^.a.'c  [»^en  detected  in  the 
ha«?h  links  of  inserted  it«ms. 
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Wo  errors  have  been  detected  in  the  conf lguration  table  and 
lewel/conf iguration  table,  with  one  exception:  all  the 
predecessor  links  in  the  lewel/conf iguration  table  remain  zero 
throughout  the  compilation  of  any  program.   Perhaps  this  pointer 
is  not  explicitly  used,  since  the  "father"  link  of  the 
configuration  table  seems  to  contain  the  saae  information.   Alsa 
no  errors  have  been  found  ia  the  manipulation  of  the  constant 
table,  naae  table  or  the  type  table  except  for  those  tokens  found 
in  an  error  context. 

5.2   Intermediate  Text 

One  problem  with  the  testing  of  the  intermediate  text  was 
that  it  was  difficult  to  construct  a  program  that  had  few  errors, 
therefore  a  readable  intermediate  text.   A  second  problem  was 
that  by  this  point  in  the  project  it  had  become  clear  that  the 
CLPOPATRA  code  generator  contained  some  serious  incompatibilities 
which  prevented  the  linking  of  the  two  passes  of  the  compiler. 
Bat  the  programs  presented  thus  far  were  "doctored  ap"  a  bit  and 
the  intermediate  text  from  their  compilation  was  analysed. 
Happily  enough,  no  errors  were  detected. 

It  miqht  s?en  somewhat  suspicious  that  the  semantics  of 
nesting  levels  and  intermediate  text  generation,  much  more 
complicated  tasks  than  lexical  and  syntactic  analysis,  would  be 
rala*iwely  error-free,  while  the  "easier"  parts  of  the  compiler 
were  bug-ridden.   There  could  be  two  reasons  for  this.   Perhaps 
the  implementor  spent  a  greater  amount  of  time  and  care  in 
constructing  tha  more  difficult  parts,  knowing  that  the  chance  3f 
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prror  was  dreat?r,  and  spent  less  time  and  attention  to  detail  on 
the  "easier"  parts.   Also  it  could  be  true  that  the  test  programs 
that  could  be  used  for  testing  in  this  phase  were  very  simple, 
due  to  the  errors  in  the  previous  phase  of  the  compiler.   Re  that 
as  i*  may.  Chapter  »  certainly  documents  enough  errors  for  two 
chapters. 
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6   Conclusions 


The  three  parts  of  the  CLBOP1TB1  analysis  phase  have  now 
been  tested.  k   thorough  test  of  the  lexical  analysis  uncovered  a 
moderate  number  of  errors,  all  of  which  were  rectified,  tkas 
establishing  a  high  level  of  reliability  for  the  scanner. 
Reasonably  thorough  tests,  considering  the  complexity  of 
CLROPiTRi,  were  run  against  the  syntax  analyser;  again  only  a 
moderate  number  of  purely  syntactic  discrepancies  were  found  (but 
not  vet  corrected).   When  the  documented  repairs  are  implemented, 
the  parser  will  be  acceptably  close  to  the  specifications. 
However,  the  syntax  tests  were  confounded  by  the  shambles  of  the 
♦•ype-cherking  and  expression-handling  portions  of  the  semantic 
analyzer.   The  remainder  of  the  semantic  analyzer,  intermediate 
text  generation,  is  fairly  good. 

Evaluation  of  this  project  is  difficult.   In  one  sense,  it 
was  very  successful  since  so  many  errors  were  found,  and  that  was 
the  purpose  of  the  testing.   However,  one  of  the  basic 
assumptions  of  a  validation  effort  is  that  the  code  has  been 
debugged  already  by  the  implementor  to  remove  the  more  mechanical 
errors  of  coding;  it  is  difficult  to  believe  that  this  was  done. 
Too  many  of  the  errors  found  would  have  become  immediately 
obvious  if  even  the  most  superficial  testing  had  been  performed. 
Par  example,  any  test  program  that  contained  even  a  very  simple 
expression  (and  one  that  does  not  is  rare)  would  have  caused  the 
compiler  to  come  crashing  down.   As  a  result,  this  project  was 
"successful"  as  a  debugging  venture. 


56 


Onf ortunatelv ,  the  project  was  not  as  successful  as  a 
validation  effort;   the  answer  to  the  question  "Ts  this 
i  mplempn»-a tion  of  a  CLPOPATRA  analysis  phase  a  valid  one?"  woull 
be  an  unequivocal  "W0!M.   At  least  not  for  this  version  o*  the 
i mplom^ntation ,  because  the  validation  process  is  not  really 
conipl  e*=»rl.   The  documented  errors  Bust  first  be  repaired,  then 
the  tests  must  be  rerun.   No  doubt  aore  errors  will  be  found 
requiring  the  process  to  be  repeated. 

Despite  tha  pessiaisa  exhibited  above,  the  future  of  the 
CtEOPATPA  project  is  proaisinq.   Plans  include  the  production  an 
an  improved  version  of  the  iapleaentation  to  incorporate 
corrections  of  the  errors  described  in  this  thesis,  and  the 
r°t»stipcT  of  the  new  version  using  the  data  that  has  been 
prQsen*-°d  here.   The  result  should  be  a  well-written,  reliable 
analysis  phase  implementation  which  will  provide  a  solid  basis 
for  future  CLEOPATRA  expansion. 


57 


References 


[11  Schreiner,  Axel  T. ,  "CLEOPATRA  Comprehensive  Language  for 
Elegant  Tparating  System  and  Translator  Design",  Report 
0TUCDCS-R-7&-6tt6,  Department  of  Computer  Science,  Oniversity 
of  Illinois,  Hay  197*. 

[2]  ,  "CLEOPATRA  A  Proposal  for  Another  System 

Implementation  Language",  Report  0I0CDCS-R-7ft-65m, 
Department  of  Computer  Science,  University  of  Illinois,  June 
197U. 

[31  Ralbur,  John  D. ,  "A  Code  Generator  for  the  CLEOPATRA 

Language",  Report  UTUCDCS-R-75-739,  Department  of  Computer 
Science,  University  of  Illinois,  July  1975. 

-a]  9    "CLEOPATRA  Code  Generator  Oserfs  Guide",  Report 

OlUCDCS-R-76-mo,  Department  of  Computer  Science,  Oniversity 
of  Illinois,  January  1976. 

[5]  Eisher,  Scott  H. ,  "Implementation  of  the  Language 

CLEOPATRA:   the  Analysis  Pass,"  Haster*s  thesis,  Oniversity 
of  Illinois,  September  1976. 

[6  1  Gruenberger,  Pred,  "Program  Testing  and  Validating", 
H§liI£tion,  Vol.  10,  No.  7  (July  1968),  pp.  39-47. 


S8 


*7]  Fimenrtorf,  W.  R.,  "Controlling  the  Functional  Testing  of  an 
operating  Systea",  IEEE  Transactions  on  Sjsteas  Science  ani 
^tSLa^iiG^*  Vo1-  SSC-5,  Wo.  U  (October  19*9)  ,  pp.    28&-290. 

[8]  Prown,  JU  R.,  and  Sampson,  W.  A.,  Progxaa  Debugging. 
Aaerican  "lsevier,  Hew  York,  1973. 

[9"|  Gooienouqh,  John  B.,  and  Gerhart,  Susan  L.,  "Toward  a  Theory 
of  Tsst  Oat  a  Selection",  IEEE  Transactions  on  Software 
rJiaiaieEilll*  Vol«  SP-1,  Wo.  2  (Jane  1975),  pp.  156-173. 

f10l  London,  R.  L.,  "Provinq  Proqraa  Correctness:  Some  Techniques 
ar1  F*aaol?s,"  PIT,  Vol.  10,  No.  2  (September  1970), 
pp.  168-182. 

[ill  Huang,  J.  ?.,    "Hn  Approach  to  Proqraa  Testing",  Co»£uting 
Sorrels,  Vol.  7,  No.  3  (Septeaber  1975),  pp.  113-128. 

(12  1  Hetzel,  Rilliaa  C,  "Principles  of  Coaputer  Proqraa 
Testinq",  Proqraa  Test  Methods,  Prentice-Pall,  Inc., 
Enqlewood  cliffs,  N.J.,  1973, 


S9 


Appendix:   Snmiary  of  Productions  Implemented 

This  appendix  summarizes  the  productions  that  define 
CLROPATRA  as  it  is  implemented.   Upper-case  symbols  and 
punctuation  designate  terminal  symbols  of  the  language;   lower- 
case symbols  represent  non-terminals.   Other  notation  used  is: 

|  indicates  a  choice 

(  J  combine  a  number  of  choices 

[  1  the  enclosed  entity  may  appear  or  may  be  omitted 

[  )•  the  enclosed  entity  may  appear  zero  or  lore  times 

{  ) •  the  anclosed  entity  must  appear  one  or  more  times 

Since  the  character  f  is  also  a  member  of  the  CLEOPATRA 
character  set,  it  will  be  underlined  (£)  when  it  denotes  itself 
as  a  terminal  symbol. 


so 


(1.1)  letter    II*   1    |    1  f    C   |    0  |    I   |    F   |    6    |    I   |   I    |    J   |    I  | 

L|B1PfO|P|Q|R|S|T|0|Vtf|lf 
I  T|  ZU|  b|  cH  |  H  f  I  9lb|  i|  11 
U  1  I  i  M  I  o  |  p  |  q  t  r  |  s  |  t  I  o  |  m 
1    x    I    y    I    z 

(1.2)  deliiiting_character   : :*    (    1    )     I    •    I    :    t    ,    f    '    t    :    I 

blanfc_character    |    _ 

(1.3)  special_character    ::=   aff|S|%ict*j-t    +    |    = 

I  «  I  »  I  /  l'<  !  >  I  1  1  *  I  •"■ 

(1.4)  digit  ::=n|i|2|l|«|5|fi»7|8|9 

(1.5)  contral  ::=  ! 

(1.6)  cownent  ::=  COMHEfT  any  sequence  of  characters  with  the 

srception  of  a  semicolon  ; 

(1.7)  : : =  !  anr  sequence  of  characters  with  the 
exception  o^  an  end  of  the  source  record 
end  of  source  record 


51 


(2.1)  identifier  ::=  letter  [letter  \    digit]* 

(2.2)  operator  ::*  fspecial_character) •  |  identifier 

(2.3)  constant  ::=  integer  \    long_integer  |  bit  J 

iiteralvalue 
(2.1)      decinal_digit  ::=  digit 
(2.5)      hexadeciial_digit  ::=  digit  |R|B|C|DfB|F 

(2.7)  binary_digit  ::=  0  J  1 

(2.8)  minus_syibol  ::=  - 

(2.9)  deciaal_string  ::=  [ ■inus_syabol]  fdeciaal_digit} • 

(2.10)  hexideciaal_string  :?=  T.  f iinas_syibol ] 

f hexadecital_diqit) • 

(2.12)  binary_string  ::=  B.  [ »inus_syabol ]  fbinary_digit) • 

(2.13)  basic_constant  ::  =  deci«al_string  |  hexadecimal_strin:j 

|  binary_string 

(2.14)  inteo?r  ::=  basic_constant 

(2.15)  lonq_integer  ::=  P.  deci»al_string 
(2.17)     bit  ::=  S.  binary_digit 

(2.28)  literal_value  ::=  C.  any  sequence  of  characters  up  to 

but  not  including  the  first  following 
blank_character 

(2.29)  systei_supplied_value  ::=  basic_type  !»TL  | 

f  LORGTNTEGER  |  IWTEGER  }   {  LARGE  |  SHALL  } 

(2.30)  systea_supplied_constant  ::=  FALSE  |  PIRST  |  LAST  | 

TROP 


52 


(».1)  basicjr#f_type    ;:=   TRTEGER    |    LOHGINTBGER    j    BIT    | 

CHIRACTB* 
C».2)  basic^tjpe    ::=    INTEGER    |    LORGIRTEGER    |     BIT    | 

CHARACTER  r  (expression) ] 
(•.<»)  ref_tjpe   ::=    f   basic_ref_type    |    tjpe_naie   } 

[integer    BJCTERTS] 
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(5.1)  strnctare_block  ::=  STRUCTURE  configaration_naie 

f  ;  link_ltei  }•  [;]  EWD  conf ignration_naae  ; 

(5.2)  conf igaration_naie  ::=  procedure_na»e  |  type_nane  | 

operator_link 

(5.3)  link_itea  ::=  TTPB  type_naie  [ALIAS  identifier]  | 

global_link_item 

(5.9)  global_link_itei  ::*  PROCEDURE  procedure_naae 

[ALIUS  identifier]  [ ref_type_list  .] 
returns  basic_ref_type 
(5.5)      global_linlc_itei  ::=  operator_link  :  OPERATOR 

f left_ref_types]  operator  [ALIAS  identifier] 
right_ref_types  RETURNS  basic_type 

(5.10)  ref_t*pe_list  ::=  (type_for»al  [, type_f orial ]•) 

(5.11)  type_fo«al  ::=  ref_type  f BI  ADDRESS] 

(5.12)  left_ref_types  ::=  f  basic_ref_type  |  type_na»e  } 

TBI  ADDRESS  : 
ref_type_list  I 
.  ref_type_list ] 

(5.13)  riqht_ref_types  ::  =  ref_type  |  :  ref_type  BT  ADDRESS 

I  ref_type_list  {  .  ref_typ*»  |  :  ref_type 
BI  ADDRESS  ] 
(5.1«i)     <jlobal_strnctiire_block  ::*  GLOBAL  STRDCTDRE  type_na«e 
f  ;  alobal_link_it.ei  }•  [;]  END  type_na»e  [;] 

(5.15)  type_name  ::=  identifier 

(5.16)  roat iae_block  ::=  PROCEDORE  procedare_na«e 

" na«e_list  ]  {  ;  statements  !•  [:]  END 
procednrp_naie  [ ; ] 
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(S.  17)  procerlure^naae   ::=    identifier 

(5.18)  nane_list  ::=  (foraal  f  ,fonsl]«l 

(5.19)  foraal  ::=  identifier 

(5.20)  proceiure^call  ::=  procedure_naae  [ paraaenters] 

(5.21)  parameters  ::=  (actual  [, actual]*) 

(5.22)  actual  ::=  expression 

(5.23)  routine_block  ::=  operator_linx  :  OPERATOR  r  left_naaes  1 

operator  right_names  f  ;  stateaent  )•   [;]  END 
3perator_link  [;] 

(5.24)  operator_link  ::=  identifier 

(5.25)  left_naaes  ::=  basic_ref_type  identifier 

[  f  :  1  .  )  naae_list  |  :  ] 

(5.26)  riqht_naaes  ::=  [:]  basic_ref_type  identifier  | 

naae_list 

f    :    I     .    )    basic_ref_type   identifier 

(5.27)  operator_call    ::=    [expression    [{   :    |    .    )    [paraasl]]] 

operator    [    [parans2]    f   :    |    .    }]   expression 
(5.27a)         paramsl    ::=    (actual   [r actual ]•■ 
(5.27b)         params2    ::=    'actual   [,  actual  }•) 
(5.29)  data_hloc:k    ::=    DATA    configuration_naae 

f    ;    r CONSTANT    |    DEPER    ]   data_qroup    )• 

[;1   PND   confiquration_naae   [;] 

(5.10)  qlobal_data_it-f»a   ::=   GLOBAT,    DATA  conf iquration_naae 

f    ;    r CONSTANT    |    DEPEP]   data_group}» 
[;]   END   confiquratioc_naae   [s] 

(5.11)  type_data_block    ::=  SLOBAL    DATA   type_naae 

f    ;    basic_tvpe  [array]   itea  [,itea]«    }• 
[;1   END   type_naae   [;] 


ss 


(5.32)  data_cjroup    ::=    {   basic_type    ]    type_naie    }    [array! 

ite«  [fltes]« 

:=    {   RIGHT    |    VECTOR   }     (bound   r#boand]«) 
:=    f   *:*    I    [integer:]   integer   > 
=   Identifier  [IRTT   constant] 


(5.34) 

array 

(5.35) 

bound 

(5.  36) 

itet  : 

(6.1)  expression    ::*   constant    |    systei_supplied_Talue    | 

systei_snpplied_constant    |    array_reference   1 
procedare_call    |    operator_call    |     (expression) 

(6.2)  array_reference  ::=  array_expression 

(6.3)  arrav_expression  : :=  identifier 


s* 


(7.1) 

stmt 

• 

(7.7) 

stmt 

• 

(7.3) 

stmt 

: 

(7.«4) 

stm* 

: 

(7.  11) 


(7.12) 
(7.13) 
(7.14) 


(7.15) 

(7. 16) 

(7.17) 
(7.1R) 
(7.19) 
(7.20) 


(7.7) 

lstmt 

• 

(7.8) 

label 

• 

(7.9) 

CStmt 

: 

(7.10) 

CStlt 

• 

=  expression  |  MIL 
=  PETrTFN  expression 
=  PTIT  r label] 

~    IP  expression  THEN  statement 
r  [«]  ETSE  statement  ] 

=  cstmt  |  label  :  cstmt  label 
=  identifier 

=  REOTN  statement  f:  statement  ]•  [;]  END 
=  rfor_ohrase]  ITERATE  f while_phrase ] 
statement  I":  statement]*  [;]  f when_phrase ]  END 
for_phrase  ::=  POP  identifier  r  PROH  expression  ] 
{  f  DONNTO  I  UPTO  }  expression  [ ; ]  I 
STEP  expression  [ ;  ]  [  f  DOWNTO  |  OPTO  } 
expression  [  ;  ]  ]  } 
while_phrase  ::=  NHILE  expression  ; 
when_phrase  ::=  WHEN  expression  [;1 
stmf  ::=  DECISION  decision  T;  decision]* 
[;]  ACTION  action  [;  action]* 
r;]  r  ELSE  statement  [;]  ]  PND 
decision  ::  =  switch  :  expression 
decision  ::=  switch f, switch} •  [ list_layoat  ]  : 

list_inflex 
switch  ::=  identifier 

list_layoat  ::=  7ECTOP  (integer  :  integer) 
list_index  ::=  expression 
action  ::=  swit ch_expression  :  statement 
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(7.21)  switch_expression  :s=  [->]   switch  [ switch_operator  [-»] 

switch ]• 

(7.22)  switch_operator  ::=  ==  f  -.=  |  AND  |  OR 

(7.23)  stateient  ::=  stit  |  lstnt 
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